bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Error during alignment using STAR #3716

Closed tdelisper closed 11 months ago

tdelisper commented 1 year ago

Hello, I am running a Bulk RNA-seq analysis using STAR and hg38 and I have this error during alignment:

File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/ipythontasks.py", line 54, in _setup_logging yield config File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/ipythontasks.py", line 300, in process_alignment return ipython.zip_args(apply(sample.process_alignment, args)) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/ipythontasks.py", line 82, in apply return object(args, kwargs) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/sample.py", line 140, in process_alignment data = align_to_sort_bam(fastq1, fastq2, aligner, data) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/alignment.py", line 87, in align_to_sort_bam names, align_dir, data) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/alignment.py", line 164, in _align_from_fastq out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/ngsalign/star.py", line 118, in align do.run(cmd.format(locals()), run_message, None) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run _do_run(cmd, checks, log_stdout, env=env) File "/mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/galaxy/../anaconda/bin/STAR --genomeDir /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/genomes/Hsapiens/hg38/star/ --readFilesIn <(gunzip -c /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/input/SRR8375411_1.fastq.gz) <(gunzip -c /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/input/SRR8375411_2.fastq.gz) --runThreadN 16 --outFileNamePrefix /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/work/bcbiotx/tmpngmy3f66/SRR83754111pass/SRR8375411 --outReadsUnmapped Fastx --outFilterMultimapNmax 10 --outStd BAM_Unsorted --limitOutSJcollapsed 2000000 --outSAMtype BAM Unsorted --outSAMmapqUnique 60 --outSAMunmapped Within --outSAMattributes NH HI NM MD AS --sjdbGTFfile /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf --sjdbOverhang 124 --outSAMattrRGline ID:SRR8375411 PL:illumina PU:SRR8375411 SM:SRR8375411 --quantMode TranscriptomeSAM | /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 2G -T /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/work/bcbiotx/tmpngmy3f66/SRR83754111pass/SRR8375411_star/SRR8375411-sorttmp -o /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/work/bcbiotx/tmpngmy3f66/SRR83754111pass/SRR8375411_star/SRR8375411.bam /dev/stdin > /mnt/hcctcga_a/groups/hcctcga/lab_members/tdelisper/ALS_GSE124439/bcbio/SraRunTable-124439/work/bcbiotx/tmpngmy3f66/SRR83754111pass/SRR8375411_star/SRR8375411.bam [W::bgzf_read_block] EOF marker is absent. The input may be truncated samtools sort: truncated file. Aborting ' returned non-zero exit status 1.

Probably the error occurs during the generation of BAM file. Any suggestion of how to solve it?

Thank you, Triantafyllos

naumenko-sa commented 1 year ago

If you are running multiple samples, do other samples work? Could you check you input input/SRR8375411_?.fastq.gz Are both files valid gzip files with the same number of lines?

Also if you could supply your project.yaml config, that might be helpful.

naumenko-sa commented 11 months ago

EOY cleanup. Please feel free to re-open.