eudoraleer / scasa

SCASA: Single cell transcript quantification tool
GNU General Public License v3.0
18 stars 4 forks source link

no mapped reads in test on SRA sample #7

Closed ldiao closed 1 year ago

ldiao commented 1 year ago

Hi, I was trying to run a test of scasa on the following SRA sample: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23717237&display=metadata .

There is an error in the alignment log (My_Project_20230408030128.align.SRR23717237_S1_L001.20230408030128.o), which is captured below. It seems that no reads are being aligned.

I am using the default reference and tried two different whitelist files, one the full whitelist file downloaded from https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz, and second just the barcodes from the sample, but with the same error message.

Error: See the warning message below regarding CB+UMI length, and the mapping rate of 0%.

[2023-04-08 03:05:40.692] [jointLog] [info] Computed 0 rich equivalence classes for further processing [2023-04-08 03:05:40.692] [jointLog] [info] Counted 0 total reads in the equivalence classes [2023-04-08 03:05:40.692] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0 [2023-04-08 03:05:40.700] [jointLog] [warning] Found 145185909 reads with CB+UMI length smaller than expected. Please report on github if this number is too large [2023-04-08 03:05:40.700] [jointLog] [info] Mapping rate = 0%

[2023-04-08 03:05:40.700] [jointLog] [info] finished quantifyLibrary() [2023-04-08 03:05:40.725] [alevinLog] [info] Starting optimizer

[2023-04-08 03:05:41.434] [alevinLog] [info] Total 0.00 UMI after deduplicating. [2023-04-08 03:05:41.434] [alevinLog] [info] Total 0 BiDirected Edges. [2023-04-08 03:05:41.434] [alevinLog] [info] Total 0 UniDirected Edges. [2023-04-08 03:05:41.434] [alevinLog] [warning] Skipped 1051 barcodes due to No mapped read [2023-04-08 03:05:41.437] [alevinLog] [info] Starting dumping cell v gene counts in mtx format [2023-04-08 03:05:41.437] [alevinLog] [error] Can't import Binary file quants.mat.gz, it doesn't exist

ldiao commented 1 year ago

OK, after some digging I have found the issue: This was an error from alevin that was due to the fact that this sample was actually using 10xv2 chemistry vs. 10xv3. When I changed the tag to --tech 10xv2, this error was no longer present. A large number of reads is still being thrown away (>40%) due to "noisy cellular barcodes", but this is an alevin issue and not scasa.