YosefLab / Cassiopeia

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
https://cassiopeia-lineage.readthedocs.io/en/latest/
MIT License
81 stars 25 forks source link

Call lineages questions #249

Closed JReneCWong closed 1 month ago

JReneCWong commented 1 month ago

Hello ,

Following the first part of the conversation in #110 (https://github.com/YosefLab/Cassiopeia/issues/110#issue-949265786):

We are working with an editable barcode that could make the editions/cuts in any position of the sequence, and the construct lacks the intBC. Setting this parameter to (0,0) in the Call Alleles module. We are able to run Cassiopeia pre-processing pipeline; however, in the Call Lineage module we are obtaining just 2 cell groups/populations (Having more than 1,000 cells), and we were expecting more.

Maybe is also important to mention taht in the Call Alleles module, we are getting the following warning: PreprocessWarning: Detected missing data in alleles. You might consider re-running align_sequences with a lower gap-open penalty, or using a separate alignment strategy. But we have set this parameter in 3 and we still getting this warning.

We explored the modules and the umi_table across them, thinking that possible the reason of this low number of cell populations is that we would need a intBC (as @mattjones315 mentioned in #110 ). Thinking on this, I have the following questions:

  1. Is it possible to add an "artificial" intBC in the umi_table? In our case, we want every cell to be treated as a single population.
  2. Do you have any recommendation for the Call Alleles module? In term of cutting sites are not exactly known.
  3. Following the manual, once we get the allele table, we want to convert it in a character matrix to continue to the trees solver and then build the tree. However, Is the tree build for just one cell group/clone? Could you do it for all of them?

We would really appreciate some guidance. Thank you very much for your time!

-Rene :)

mattjones315 commented 1 month ago

Hello @JReneCWong, thanks for using Cassiopeia and posting this issue!

It sounds like based on your use case, you don't have to run the call lineages step because you assume all the cells are in one allele table. However, if you want to remove doublets and do some intBC correction as implemented in the call-lineages step, you can set a "dummy" intBC as you suggest and that should work.

Regarding character matrix formation, you can pass the resulting allele table straight to cas.pp.convert_alleletable_to_character_matrix. We typically create a character matrix for each unique clonal group, but in your case since all cells are part of one clone this is not a concern. More generally, the tree reconstruction will build a tree for whatever is in the character matrix so you have flexibility there.

Hope this helps and let me know if you have any other questions I can help with!

Best, Matt

JReneCWong commented 1 month ago

Hello Matt,

Thanks a lot for your reply! Do you have any recommendation about the warning we are getting?

"PreprocessWarning: Detected missing data in alleles. You might consider re-running align_sequences with a lower gap-open penalty, or using a separate alignment strategy. But we have set this parameter in 3 and we still getting this warning."

Also any recommendation about how to set the parameters for Calling alleles thinking on we are not sure of the cutting positions from our system...

Thanks!

-Rene

JReneCWong commented 1 month ago

Hello Matt,

I followed your advice about transforming the allele table to character matrix. However we got the following error:

vanilla_greedy = cas.solver.VanillaGreedySolver() vanilla_greedy.solve(cas_tree, collapse_mutationless_edges=True) Traceback (most recent call last): File "", line 1, in File "/shared/renewong/miniconda2/envs/Cassiopeia/lib/python3.8/site-packages/cassiopeia/solver/GreedySolver.py", line 190, in solve _solve( File "/shared/renewong/miniconda2/envs/Cassiopeia/lib/python3.8/site-packages/cassiopeia/solver/GreedySolver.py", line 115, in _solve self.perform_split( File "/shared/renewong/miniconda2/envs/Cassiopeia/lib/python3.8/site-packages/cassiopeia/solver/VanillaGreedySolver.py", line 113, in perform_split

  • weights[character][state] KeyError: 1

I also followed the idea of using a "dummy" intBC (using the Cell ID) to try execute the modules after "Call Alleles", but we still having 1-2 cell groups.

About "PreprocessWarning: Detected missing data in alleles. You might consider re-running align_sequences with a lower gap-open penalty, or using a separate alignment strategy". But as I mentioned, I have used different values for this gap-open penalty. Thinking on that we do not know exactly the positions where our system could edit the barcode, What would you recommend for setting the parameters in this "Call alleles" module? I think we should proceed with the modules after this, given we want (ideally) to have just one barcode per cell, and if I understood correctly, until Call Alleles we have not collapsed the different alleles in a cell.

Thank you so much for your time and help!