cbg-ethz / COMPASS

GNU General Public License v3.0
18 stars 10 forks source link

Report per-cell clone assignments? #1

Closed murphycj closed 1 year ago

murphycj commented 1 year ago

Hello!

Really interesting tool. Basic question, but does COMPASS output a file containing the per-cell clone assignments as well as the genotypes in each clone? I ask because that would make it easier to programmatically use the data for other downstream tasks.

Thanks!

e-sollier commented 1 year ago

Hi,

Thanks for your interest in COMPASS! COMPASS used to only output the tree in graphviz format, but I have just added other outputs, including assignments of cells to nodes and genotypes of nodes. The outputs are now described in the readme, and you can see an example of an output generated by COMPASS in https://github.com/cbg-ethz/COMPASS/tree/master/data/output . I hope this helps, and I'd be happy to help if you have further questions.

Cheers, Etienne

murphycj commented 1 year ago

That is exactly what I wanted, thank you!

One small question. I see in your _cellAssignments.tsv example file that the node assignments for one cell is 25, which is not listed in the _genotypes.tsv file. I presume that node is a doublet, which is listed in the _cellAssignmentsProbs.tsv file?

Example: https://github.com/cbg-ethz/COMPASS/blob/master/data/output/AML-59-001_cellAssignments.tsv#L318

e-sollier commented 1 year ago

Ah, yes, this cryptic 25 indeed corresponded to a doublet. I was not sure whether or not to include doublets in the output. Internally, COMPASS can assign a cell either to a node or to a doublet (ie a pair of node), but I think that for users it might be easier if cells are only assigned to singlets in the output.

I've just changed the output. Now in the output tsv files, cells are only assigned to singlets (and the probabilities in _cellAssignmentsProbs.tsv are recomputed by only normalizing over singlets), but I added one column in _cellAssignments.tsv to indicate whether or not the cell was inferred to be a doublet.

Hopefully this new output is easier to work with, but if the full doublet information is useful to you, I could add it back (and explicitly display the two nodes corresponding to each doublet).

murphycj commented 1 year ago

Thanks for the clarification and update. I think the changes you made are sufficient. Right now I cannot think of a reason to need the full doublet information.