Mapped/Unmapped Reads - Githubissues

jamesboot commented 3 years ago

Hello, we recently got some 10X scRNAseq data which contained 6 different sample hashtags. Running the data through the 10X pipeline we found that only 2 of the sample hastags were detected. We therefore decided to try and run the samples through CITE-seq-Count to see if we could recreate the problem. My initial first pass at running the CITE-seq-Count pipeline found that 100% of my reads were unmapped. After reading through the help docs and other posts (issue #62) I added "--start-trim 10" to my script and changed -umil from 26 to 28. Also, to save some computing time and quickly assess if this fixed the issue I ran the script on only the first 1,000,000 reads using "first_n". In the output I now get a value of percentage mapped reads of 240 and unmapped 10! I'm now concerned that I have a percentage mapped reads greater than 100 and wanted to see if this is to be expected and so on before re-running the script on all reads. I'll paste in below the parameters I used the first time (100% unmapped reads) and the second time (240% mapped reads). Any helps would be much appreciated!

Run 1: CITE-seq-Count -R1 /path_to_read1.fastq.gz -R2 /path_to_read2.fastq.gz -t /tags.csv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 6000 -o /outputdir/

Run 2: CITE-seq-Count -R1 /path_to_read1.fastq.gz -R2 /path_to_read2.fastq.gz -t /tags.csv -cbf 1 -cbl 16 -umif 17 -umil 28 -cells 6000 --start-trim 10 --first_n 1000000 -o /outputdir/

Hoohm commented 3 years ago

Hello @jamesboot

the reason why you get a 240% is because of a bug in the counting of reads when using multiple files on version 1.4.4 when using the top_n option.

You should be fine running the whole sample.

Version 1.5 is a complete overhaul of this step and the bug is fixed there.

I hope you feel reassured. Good catch!

jamesboot commented 3 years ago

Thanks @Hoohm! Indeed I've run the whole sample and the percentage mapped reads is 97% - thanks!

Hoohm / CITE-seq-Count

Mapped/Unmapped Reads #156