SciLifeLab / Sarek

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
https://nf-co.re/sarek
MIT License
132 stars 6 forks source link

BWA map error #808

Closed fw1121 closed 4 years ago

fw1121 commented 5 years ago

When run

nextflow run SciLifeLab/Sarek/main.nf --sample dnaseq_run/multiple.tsv \
--step mapping --genome GRCh38 --genome_base /db/dnaseq/GRCh38 \
--tag latest -profile docker

obtain error:

  [W::sam_read1] Parse error at line 11210991
  samtools sort: truncated file. Aborting

however, change dir to error message suggested dir:

Work dir:
  /dnaseq_run/work/03/5ca165d989087ab1cb5aaa61076a7f

then run:

sh .command.sh

i.e.

#!/bin/bash -euo pipefail
bwa mem -R "@RG\tID:PCL-016_N_1\tPU:PCL-016_N_1\tSM:PCL-016_N\tLB:PCL-016_N\tPL:illumina"  -t 20 -M     Homo_sapiens_assembly38.fasta SRR1713550_R1.fastq.gz SRR1713550_R2.fastq.gz |     samtools sort --threads 2 -m 2G - > PCL-016_N_1.bam

everything is OK,

PS. input raw data size:

System (please complete the following information):

Nextflow (please complete the following information):

Container engine (please complete the following information):

maxulysse commented 5 years ago

Hi @fw1121, thanks a lot for your interest and for reporting this issue.

We're currently porting sarek to nf-core: https://github.com/nf-core/sarek/tree/dev

I do remember running into a similar error, once or twice, but I never got it since I'm working on this nf-core port where I included the "-K 100000000" parameter for BWA to fix the number of reads processed by bwa mem, as chunk size can affect BWA results: https://github.com/nf-core/sarek/blob/c9088ffed8377bb55db95609c265f6d3e2ef1f1b/main.nf#L455-L459

Can you try that maybe? Can you show your tsv file as well? All the best, Maxime

fw1121 commented 5 years ago

Hi @MaxUlysse , I have upload my demo tsv file, and all data include in this was obtained from NCBI, you can have a try,

Best wishes, multiple.tsv.zip

Wei

fw1121 commented 5 years ago

Hi @MaxUlysse , As you suggestion, change the command to

    """
        bwa mem -K 100000000 -R \"${readGroup}\" ${extra} -t ${task.cpus} -M \
            ${genomeFile} ${inputFile1} ${inputFile2} | \
        samtools sort --threads ${task.cpus} -m 2G - > ${idRun}.bam
    """

still not working,

Best wishes, Wei

maxulysse commented 4 years ago

Closing due to moving to nf-core/sarek. If you still have the issue @fw1121 can you please open a new one on the nf-core repo? All the best, Maxime