chung-lab / SCAFE

Single Cell Analysis of Five'Ends
MIT License
45 stars 11 forks source link

non-10x compatibility #14

Closed ysun-8 closed 2 years ago

ysun-8 commented 2 years ago

Hi,

I have a question regarding potential compatibility with non-10x platforms, but still 5' end sequencing. In particular, I have 5' end reads in a sorted .bam file that look like barcode-UMI-GGG-5' end (22 bp - 8 bp - 3 bp - 73 bp for a total of 106 bp).

Will it be possible to run SCAFE in this case? Is there a way to bypass the scafe.tool.cm.remove_strand_invader step in performing scafe.workflow.sc.solo? In addition, what changes might I need to make in scafe.tool.sc.bam_to_ctss in order to correctly identify the 5' end?

Thanks and much appreciated.

chung-lab commented 2 years ago

Apologize to our very late response and thanks for using SCAFE.

The updated version v1.0.0 scafe.tool.sc.bam_to_ctss should be able to deal with non-10x data. I would suggest you trim off the barcode-UMI from your fastq and leave only GGG-5' end-cDNA, and it should work under the --detect_TS_oligo=auto

or if you do not bother trimming your fastq and remap, you can input a dummy TSO sequence and ask bam_to_ctss to not checking the TSO sequence and blindly trim by the length of the TSO, with options: --detect_TS_oligo=trim --TS_oligo_seq=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

I put in 22+8 As as the dummy TSO seq as you have BC-UMI 22 bp - 8 bp

you cannot skip remove_strand_invader in the workflow. You'll need to run scafe.tool.cm.cluster directly from the output of scafe.tool.sc.bam_to_ctss.

suggestions are welcome and feel free to ask for help.