Closed ShuyangXu closed 8 months ago
??
??
sorry, typo before
I have re-edited it again.
Hi @ShuyangXu, Based on your command, I think you might be using an outdated version of Cell Ranger, but in any event I suspect this is due to duplicate QNAMES in your FASTQ file.
Earlier versions of Cell Ranger assumed that the input FASTQ files were produced by Illumina and so would not have duplicated QNAMES. Given a set of reads with a given barcode/umi combination, we previously marked the read with the "lowest" QNAME as the one that should be counted. However, we discovered (thanks to a report like this) that customer would accidentally create FASTQ files with duplicated QNAMES, and this duplication led to a given barcode/umi being double counted, as the assumption that the read name was unique is violated in data that has multiple copies of the same read name. Does your FASTQ file indicate that the QNAMEs are not unique, e.g. below are two occurrences of a QNAME that would cause this issue in a FASTQ file if it appeared twice:
@A01182:88:HCWMWDSX3:1:2626:20365:26052 1:N:0:TCCAACAACG+AAACCCGGAC
@A01182:88:HCWMWDSX3:1:2626:20365:26052 1:N:0:TCCAACAACG+AAACCCGGAC
Are all the QNAMEs in your FASTQ file unique?
Thanks for your reply.
As I mentioned in the beginning, not all the QNAMEs in my data are unique. A fellow combined the data by mistake.
This problem could also be reproduce by using CellRanger v7.0.0 with the additional data (./external/cellranger_tiny_fastq/
)
Considering the computer speed and performance, I could understand the reason why you use such a trick to count UMIs, but I think it may need more robust updates.
sorry for public this issue before completely editing in first time
hi,
Recently, I ran cellranger with an inaccurate fastq result which contains some duplicated reads(same id, same sequence).
And I filtered them then rerun cellranger again. But I found UMI counting in these two results are different, which is a little weird because, as well known, UMIs are counted by unique UMI number, not by reads number. For that reason, these duplicated reads should be merged into same UMI and contribute nothing.
Then I used example fastq to test and reproduce the issue.
reproduce issue
result
normal
web_summary.html
matrix.mtx.gz
dup
web_summary.html
matrix.mtx.gz
discuss
a) First, as expect,
dup
'sNumber of Reads
is double. However UMIs are double as well.Then I looked in
dup
's molecule_info.h5molecule_info.h5/umi
molecule_info.h5/barcode_idx
Same UMI in same barcode became double.
b) And I also notice cellrange will throw an error
when input duplicated reads in two different fastq files. (not like in one file as above)
In conclusion, I wonder if there might be two issues: a) is counting UMI error when reads duplicated? b) is the condition of duplicate reads in one file unconsidered?
Thank you