Illumina / DRAGMAP

DRAGEN open-source mapper
Other
155 stars 30 forks source link

Dragmap failed ERROR: This thread caught an exception first #11

Open quentin67100 opened 2 years ago

quentin67100 commented 2 years ago

Hi,

I want to test dragmap (currently I'm using Bwa mem2) but I get an error. First precision : I use dragmap with a Conda env, the last version. Command used:

dragen-os \
-r ${REF_Genome} \
-1 ${Fastq_DIR}/${read1} \
-2 ${Fastq_DIR}/${read2} \
--RGID HG001 \
--RGSM HG001 \
--num-threads ${CPU_number} \
| samtools view \
-b \
-h \
-L ${BED} \
-@ 2 \
> ${Align_DIR}/${ID}.trimmed.align.filtered.bam 2> ${Align_DIR}/logs/${ID}.trimmed.align.filtered.log

It failed after less than 2 minutes. At first it seems to work normally, in multithreading. But then it only uses one thread and it ends up failing. I get a start of bam with aligned reads.

I am using 11 threads and I have 90G of memory.

The log file:

2021-10-20 18:51:23 [2ba35fd534c0] Version: 1.2.1 2021-10-20 18:51:23 [2ba35fd534c0] argc: 13 argv: dragen-os -r /shared/projects/gentaumix/dragen/reference -1 /shared/projects/gentaumix/HG001/02_Trimming/fastq_drag/HG001.trimmed.R1.fastq.gz -2 /shared/projects/gentaumix/HG001/02_Trimming/fastq_drag/HG001.trimmed.R2.fastq.gz --RGID HG001 --RGSM HG001 --num-threads 11 decompHashTableCtxInit... 0.824 seconds decompHashTableHeader... 0.002 seconds decompHashTableLiterals... 1.926 seconds decompHashTableExtIndex... 0.041 seconds decompHashTableAutoHits... 44.869 seconds decompHashTableSetFlags... 6.205 seconds finished decompress Running dual fastq workflow on 11 threads. System supports 56 threads. 0 249 0 0 0 0 10000 1 40000 1 1000 0 0 0 6 0 250 0 0 0 0 10000 1 40000 1 1000 0 0 0 5 0 251 0 0 0 0 10000 1 40000 1 1000 0 0 0 4 0 252 0 0 0 0 10000 1 40000 1 1000 0 0 0 3 0 253 0 0 0 0 10000 1 40000 1 1000 0 0 0 2 0 254 0 0 0 0 10000 1 40000 1 1000 0 0 0 1 0 0 271 361 490 392.361 158.769 1 1147 1 789 89456 90372 0 0 Initial paired-end statistics detected for read group all, based on 89456 high quality pairs for FR orientation Quartiles (25 50 75) = 271 361 490 Mean = 392.361 Standard deviation = 158.769 Rescue radius = 396.924 Effective rescue sigmas = 2.5 Boundaries for mean and standard deviation: low = 1, high = 928 Boundaries for proper pairs: low = 1, high = 1147 NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN) [47982249010944] ERROR: This thread caught an exception first

Other precision: I get exactly the same error if I send the results of dragmap in a sam file instead of Samtools view. other precision : I also tried the version 1.2.0 with the same error

how to solve it?

rizkg commented 2 years ago

Hi, If that is some public data, could you share the input fastq so that I can replicate the error ?

quentin67100 commented 2 years ago

Hi, If that is some public data, could you share the input fastq so that I can replicate the error ?

It's fastq from this accession : SRR14724533 I used fastp with the default options + poly g tail trimming. Then i used the fastq output of fastp as an input for dragmap.

rizkg commented 2 years ago

Thanks for the info ! I'll get back to you when I have news.

quentin67100 commented 2 years ago

Thanks for the info ! I'll get back to you when I have news.

I just tested dragmap on the fastq of this accession directly (without using fastp) and it works. On the other hand it's quite slow (16h for a 30x human genome, with equivalent resources and on this sample bwa mem 2 takes 7.2h) but I suppose that this is the kind of thing that will improve with the next versions .

rizkg commented 2 years ago

Hi, we were able to replicate the issue and found the cause, a fix will be there soon.

RichardCorbett commented 2 years ago

Hi folks. I'm seeing the same error using some private in-house whole genome data:

dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-01-10 14:27:53     [7f15d6033740]  Version: 1.2.1
2022-01-10 14:27:53     [7f15d6033740]  argc: 5 argv: dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.184 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.299 seconds
decompHashTableExtIndex...
  0.094 seconds
decompHashTableAutoHits...
  24.441 seconds
decompHashTableSetFlags...
  2.636 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0   249 0   0   0   0   10000   1   40000   1   1000    0   0   0   6   
0   250 0   0   0   0   10000   1   40000   1   1000    0   0   0   5   
0   251 0   0   0   0   10000   1   40000   1   1000    0   0   0   4   
0   252 0   0   0   0   10000   1   40000   1   1000    0   0   0   3   
0   253 0   0   0   0   10000   1   40000   1   1000    0   0   0   2   
0   254 0   0   0   0   10000   1   40000   1   1000    0   0   0   1   
[139729232258816]   ERROR: This thread caught an exception first

I see that error after about 3 hours of runtime and the processes seem to hang and never return. I installed this version through conda. is there a recommended workaround?

rizkg commented 2 years ago

Hi Richard, We were able to find and fix this bug, which arises for the mapping of some very short reads. We will publish the fix on this repo very soon, Best, Guillaume

rizkg commented 2 years ago

Hi, A fix for this issue has been pushed to the master branch. Could you try again with latest version from master on your data and check it fixed the bug you had ? Guillaume

RichardCorbett commented 2 years ago

Hi there.
I installed from master but got the same error again:

for b in $(ls *bam); do echo "/gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b ${b}  > ${b}_dragmap.sam"; done  | bash -x
+ /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-01-20 12:33:03     [7f177c99f7c0]  Version: 1.2.1-5-gf36d7849
2022-01-20 12:33:03     [7f177c99f7c0]  argc: 5 argv: /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.741 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.795 seconds
decompHashTableExtIndex...
  0.077 seconds
decompHashTableAutoHits...
  28.186 seconds
decompHashTableSetFlags...
  3.060 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0   249 0   0   0   0   10000   1   40000   1   1000    0   0   0   6   
0   250 0   0   0   0   10000   1   40000   1   1000    0   0   0   5   
0   251 0   0   0   0   10000   1   40000   1   1000    0   0   0   4   
0   252 0   0   0   0   10000   1   40000   1   1000    0   0   0   3   
0   253 0   0   0   0   10000   1   40000   1   1000    0   0   0   2   
0   254 0   0   0   0   10000   1   40000   1   1000    0   0   0   1   
[139737098520320]   ERROR: This thread caught an exception first
rizkg commented 2 years ago

Hi, thanks for checking. I am working on it.

rizkg commented 2 years ago

Hi, a new fix was pushed to master branch. Could you check again on your data ? Thanks, Guillaume

RichardCorbett commented 2 years ago

Looks like I still get an error:

dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-02-03 10:28:41     [7f320a6287c0]  Version: 1.2.1-7-gc87d93aa
2022-02-03 10:28:41     [7f320a6287c0]  argc: 5 argv: /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-7/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.505 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.205 seconds
decompHashTableExtIndex...
  0.070 seconds
decompHashTableAutoHits...
  23.794 seconds
decompHashTableSetFlags...
  1.850 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0   249 0   0   0   0   10000   1   40000   1   1000    0   0   0   6   
0   250 0   0   0   0   10000   1   40000   1   1000    0   0   0   5   
0   251 0   0   0   0   10000   1   40000   1   1000    0   0   0   4   
0   252 0   0   0   0   10000   1   40000   1   1000    0   0   0   3   
0   253 0   0   0   0   10000   1   40000   1   1000    0   0   0   2   
0   254 0   0   0   0   10000   1   40000   1   1000    0   0   0   1   
[139851179972352]   ERROR: This thread caught an exception first
RichardCorbett commented 2 years ago

I have permission to share the data with you if it helps

rizkg commented 2 years ago

Hi Richard, Yes that would be very helpful ! How big is it ?

RichardCorbett commented 2 years ago

To share the bam and reference i am using it would be about 52Gb.

RichardCorbett commented 2 years ago

Hi @rizkg , Have you had any luck reproducing my error? I am getting some pressures at my center to have this up and running, so please let me know if there is anything else I can provide.

RichardCorbett commented 2 years ago

Also, do you think it may help if I try running your binary directly (or in a container?)

rizkg commented 2 years ago

Hello, Yes I have been able to reproduce the error. It does not seem to come from your hashtable or from your binary. The problem seems to be in the bam parsing code. As a temporary workaround, you could first convert your bam to fastq, e.g. samtools bam2fq B46157_4_lanes_dupsFlagged.bam | gzip > file.fastq.gz And then run dragmap with this fastq file, e.g. dragen-os -r hg38_no_alt_dragmap_ref -1 file.fastq.gz --output-directory ./ --output-file-prefix B46157 I'll keep you posted as soon as we have a fix for this.

RichardCorbett commented 2 years ago

Thanks. Trying it out now.

rizkg commented 2 years ago

Hello again, Forget what I said before, that would give you single-end mapping. The issue is because we do not support bam input sorted by coordinate, it should be sorted by read names. So you should do, e.g.

samtools sort --threads 16 -n B46157_4_lanes_dupsFlagged.bam > B46157_4_lanes_dupsFlagged_name_sorted.bam

And then use the name sorted bam as dragmap input, and specify --interleaved true in the dragmap options to have paired mapping. We'll add a proper check and error message for this problem.

yangyxt commented 2 years ago

I'm having the same problem. Only that I'm inputting paired fastq files instead of bam file.
The command looks like this: dragen-os -r /paedyl01/disk1/yangyxt/indexed_genome/hg19 -1 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A 160792B_1_val_1.fq.gz -2 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A160792B_2_val_2.fq.gz --num-threads 23 --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner. sw-method dragen --verbose --RGID A160792B --RGSM A160792B --output-directory /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results --output-file-prefix A160792B

And here is the error log: 2022-04-22 17:37:05 [2b2fe2e5ee00] Version: 1.2.1 2022-04-22 17:37:05 [2b2fe2e5ee00] argc: 24 argv: dragen-os -r /paedyl01/disk1/yangyxt/indexed_genome/hg19 -1 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A 160792B_1_val_1.fq.gz -2 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A160792B_2_val_2.fq.gz --num-threads 23 --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner. sw-method dragen --verbose --RGID A160792B --RGSM A160792B --output-directory /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results --output-file-prefix A160792B.bqsr decompHashTableCtxInit... 1.133 seconds decompHashTableHeader... 0.001 seconds decompHashTableLiterals... 1.627 seconds decompHashTableExtIndex... 0.044 seconds decompHashTableAutoHits... 19.191 seconds decompHashTableSetFlags... 1.453 seconds finished decompress INFO: writing SAM file to "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.sam" INFO: writing mapping metrics stats into "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.mapping_metrics.csv" INFO: writing insert stats into "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.insert-stats.tab" Running dual fastq workflow on 23 threads. System supports 80 threads. Initial paired-end statistics detected for read group all, based on 88335 high quality pairs for FR orientation Quartiles (25 50 75) = 233 300 373 Mean = 304.777 Standard deviation = 106.367 Rescue radius = 265.917 Effective rescue sigmas = 2.5 Boundaries for mean and standard deviation: low = 1, high = 653 Boundaries for proper pairs: low = 1, high = 793 NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN) [47523547105024] ERROR: This thread caught an exception first /paedyl01/disk1/yangyxt/ngs_scripts/common_bash_utils.sh: line 3651: 297460 Segmentation fault (core dumped) dragen-os -r ${ref_genome_dir} -1 ${forward_reads} -2 ${reverse_rea ds} --num-threads ${threads} --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner.sw-method dragen --verbose --RGID ${samp_ID} --RGSM ${samp_ID} --output-directory $(dirname ${output_ align}) --output-file-prefix $(basename ${output_align/.bam/})

yangyxt commented 2 years ago

Sorry I dunno why the text wrap is disabled... I'll paste the key lines from the error log down below: Initial paired-end statistics detected for read group all, based on 88335 high quality pairs for FR orientation Quartiles (25 50 75) = 233 300 373 Mean = 304.777 Standard deviation = 106.367 Rescue radius = 265.917 Effective rescue sigmas = 2.5 Boundaries for mean and standard deviation: low = 1, high = 653 Boundaries for proper pairs: low = 1, high = 793 NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN) [47523547105024] ERROR: This thread caught an exception first /paedyl01/disk1/yangyxt/ngs_scripts/common_bash_utils.sh: line 3651: 297460 Segmentation fault (core dumped)

rizkg commented 2 years ago

Hi, Thanks for your report. Although this is same error message as previous error reports in this thread, I am not sure this has a common cause. We are working on reporting more meaningful error messages. Meanwhile, would you be able to share your input files ?

yangyxt commented 2 years ago

Thank you for the response! I'm not sure I can. Even if I want to, the FASTQ files are huge since they are WGS samples.

Rohit-Satyam commented 2 years ago

Hi I am facing the same issue. I am aligning my short reads to SARS-Cov-2 reference genome.

dragen-os --num-threads 10 -r results/04_alignDRAGMAP/index/dragmapidx -1 data/S9_1.fastq.gz -2 data/S9_2.fastq.gz > temp.sam

2022-07-28 19:30:32     [14afcfe29740]  Version: 1.3.0
2022-07-28 19:30:32     [14afcfe29740]  argc: 9 argv: dragen-os --num-threads 10 -r results/04_alignDRAGMAP/index/dragmapidx -1 data/S9_1.fastq.gz -2 data/S9_2.fastq.gz
decompHashTableCtxInit...
  0.000 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  0.004 seconds
decompHashTableExtIndex...
  0.000 seconds
decompHashTableAutoHits...
  0.010 seconds
decompHashTableSetFlags...
  0.004 seconds
finished decompress
Running dual fastq workflow on 10 threads. System supports 112 threads.
0   249 0   0   0   0   10000   1   40000   1   1000    0   0   0   6   
0   250 0   0   0   0   10000   1   40000   1   1000    0   0   0   5   
0   251 0   0   0   0   10000   1   40000   1   1000    0   0   0   4   
0   252 0   0   0   0   10000   1   40000   1   1000    0   0   0   3   
0   253 0   0   0   0   10000   1   40000   1   1000    0   0   0   2   
0   254 0   0   0   0   10000   1   40000   1   1000    0   0   0   1   
Segmentation fault (core dumped)

The index was created using

samtools faidx dragmapidx/$fasta 
gatk CreateSequenceDictionary -R dragmapidx/$fasta  
dragen-os --build-hash-table true --ht-reference dragmapidx/$fasta  --output-directory dragmapidx --ht-num-threads 20
gatk ComposeSTRTableFile -R dragmapidx/$fasta -O dragmapidx/str_table.tsv

and the directory look like

hash_table.cfg
hash_table.cfg.bin
hash_table.cmp
hash_table_stats.txt
reference.bin
ref_index.bin
repeat_mask.bin
sequence.dict
sequence.fasta
sequence.fasta.fai
str_table.bin
str_table.tsv
AishaShah commented 1 year ago

Is this bug solved in latest version?

Rohit-Satyam commented 1 year ago

The latest release was on May 5th 2022. And I reported the bug in July 2022. So yes this isn't solved yet.