GWW / scsnv

scSNV Mapping tool for 10X Single Cell Data
MIT License
22 stars 4 forks source link

Error at scsnv map step #35

Closed DLuong79 closed 3 months ago

DLuong79 commented 4 months ago

When I try to run the map step of my pipeline, I keep getting errors that prevent the merged.bam file from being created. The error seems to be different for each run, e.g., Data length wrong dl = 606 compared to 605 Error! Inconsistent number of reads between the read1 and read 2 files error writing sam I'm running this on a public dataset from the European Nucleotide Archive.

GWW commented 4 months ago

Have you checked that the number of lines in your read files are the same?

zcat read1.fastq.gz | wc -l zcat read2.fastq.gz | wc -l

If you don't mind letting me know which dataset it is I can try to download it and take a look to see if I can identify/reproduce the issue.

DLuong79 commented 4 months ago

I downloaded the fastqs from this page: https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-9543/sdrf I only downloaded files from Patient 386C and Patient 432C (Sort by "Patient").

GWW commented 4 months ago

Is there a specific sample from those patients that is giving you the issue or are they all giving you an issue?

DLuong79 commented 3 months ago

All of them except for ERR9924270 (transverse colon) fail to reach the end of the pipeline.

GWW commented 3 months ago

I will download one and try it. In the meantime have you verified that your fastq files are complete?

zcat r1.fastq.gz | wc -l
zcat r2.fastq.gz | wc -l

Should have the same number of lines.

GWW commented 3 months ago

You probably have incomplete or corrupted fastq files somewhere. I successfully downloaded and processed this sample:

34G May  9 15:42 Human_colon_16S8117828_S1_L001_R1_001.fastq.gz
82G May  9 19:49 Human_colon_16S8117828_S1_L001_R2_001.fastq.gz
[18:35:09] Done writing unmapped reads
[18:35:09] Done writing 1606475407 reads and 9802001 marked as discarded
[18:35:09] Deleting the temporary bam files
[18:35:13] Done
scsnv count -k ~/scsnv/data/737K-august-2016.txt -l V2_5P -o colon_test/barcode colon_test/run1
scsnv map -l V2_5P -i ~/ref/scsnv/scsnv -g ~/ref/genome.fa -b ./colon_test/barcode -t 24 --bam-write 4 -q 4 -c ~/ref/gene_groups.txt -o colon_test/ colon_test/run1