YosefLab / Cassiopeia

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
https://cassiopeia-lineage.readthedocs.io/en/latest/
MIT License
76 stars 24 forks source link

v4 chemistry #250

Open diazlab opened 5 days ago

diazlab commented 5 days ago

Hi, I would like to run Cassiopia preprocessing on a 10X v4 chemistry library. Based on my understanding of your code, and v4 vs. v3 library structure, it seems that I could run cas.pp.convert_fastqs_to_unmapped_bam with chemistry='10xv3' to generate the needed bams. But then provide the v4 cell barcode whitelist in cas.pp.error_correct_cellbcs_to_whitelist . Otherwise, I could execute the pipeline as described in the tutorials without further modification. Is that right?

Also, I don't quite understand the difference in the code between the chemistry='10xv3' and '10xv2' invocations of convert_fastqs_to_unmapped_bam. I can't seem to find the instantiation of the ngs.chemistry object. Can you point me to that so I can implement code for v4 if necessary?

thanks

tzeitim commented 4 days ago

Also, I don't quite understand the difference in the code between the chemistry='10xv3' and '10xv2' invocations of convert_fastqs_to_unmapped_bam. I can't seem to find the instantiation of the ngs.chemistry object. Can you point me to that so I can implement code for v4 if necessary?

https://github.com/Lioscro/ngs-tools/blob/aa3e864e59ae78467a331f671967c93d62a6e2ad/ngs_tools/chemistry/SingleCellChemistry.py#L125

mattjones315 commented 3 days ago

Hi @diazlab ,

Thanks so much for using Cassiopeia and posting this issue!

The major difference between v2 and v3 chemistry, for the purpose of processing libraries in Cassiopeia, is the extension of the UMI sequence from 10nt to 12nt. It sounds like the v4 chemistry has identical R1 structure to v3 (judging from this resource from the ever helpful Teichmann Lab). So @diazlab, I believe you are correct that you can run Cassiopeia here using the '10xv3' chemistry setting, but passing in the v4 cellBC whitelist for cellBC error correction.

As @tzeitim pointed out, the ngs.chemistry object is implemented in a separate codebase and linked above. Thanks for linking that @tzeitim !

Please let me know how this works, and if there are any unanticipated issues you run into that I can help with.

Best, Matt