barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

Error(s) in GenomeDiff format. FILE when trying to add duplicate #323

Closed xiaolinchu92 closed 1 year ago

xiaolinchu92 commented 1 year ago

Hello, I'm having a very similar error as #203 and #144 , but I'm using the newer version 0.36.1. This is my second breseq run for a sample using the merged junction list, which was extracted and merged using the command: gdtools UNION -o .../.../merged_breseq_output.gd -e .../samp1/output.gd .../samp2/output.gd and then: breseq -p -j 1 --user-evidence-gd .../.../merged_breseq_output.gd etc. where etc. are the command for the initial run without --user-evidence-gd (which worked fine).

>>> Error(s) in GenomeDiff format. FILE:  <<<

>>ERROR: Attempt to add duplicate of this existing entry:
JC<tab>5<tab>.<tab>NC_012660<tab>6615547<tab>1<tab>NC_012660<tab>6615564<tab>-1<tab>0<tab>alignment_overlap=0<tab>coverage_minus=102<tab>coverage_plus=82<tab>flanking_left=150<tab>flanking_right=150<tab>key=NC_012660__6615547__1__NC_012660__6615564__-1__0____150__150__0__1<tab>max_left=144<tab>max_left_minus=144<tab>max_left_plus=141<tab>max_min_left=74<tab>max_min_left_minus=74<tab>max_min_left_plus=72<tab>max_min_right=75<tab>max_min_right_minus=75<tab>max_min_right_plus=75<tab>max_pos_hash_score=278<tab>max_right=144<tab>max_right_minus=144<tab>max_right_plus=143<tab>neg_log10_pos_hash_p_value=NT<tab>pos_hash_score=122<tab>side_1_continuation=0<tab>side_1_overlap=0<tab>side_1_redundant=0<tab>side_2_continuation=0<tab>side_2_overlap=0<tab>side_2_redundant=1<tab>total_non_overlap_reads=184
Add a 'unique' tag to one if this is intentional.
>>ON LINE:     0
JC<tab>55<tab>.<tab>NC_012660<tab>6615547<tab>1<tab>NC_012660<tab>6615564<tab>-1<tab>0<tab>alignment_overlap=0<tab>coverage_minus=0<tab>coverage_plus=0<tab>flanking_left=150<tab>flanking_right=150<tab>key=NC_012660__6615547__1__NC_012660__6615564__-1__0____150__150__0__0__UD<tab>max_left=0<tab>max_left_minus=0<tab>max_left_plus=0<tab>max_min_left=0<tab>max_min_left_minus=0<tab>max_min_left_plus=0<tab>max_min_right=0<tab>max_min_right_minus=0<tab>max_min_right_plus=0<tab>max_pos_hash_score=278<tab>max_right=0<tab>max_right_minus=0<tab>max_right_plus=0<tab>neg_log10_pos_hash_p_value=NT<tab>pos_hash_score=0<tab>reject=COVERAGE_EVENNESS_SKEW<tab>side_1_continuation=0<tab>side_1_overlap=0<tab>side_1_redundant=0<tab>side_2_continuation=0<tab>side_2_overlap=0<tab>side_2_redundant=0<tab>total_non_overlap_reads=0<tab>user_defined=1

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Fatal formatting error in GD file being written: .../.../rebreseq_output/C26bEP_P_T120C26bEP_P//05_alignment_correction/jc_evidence.gd
FILE: genome_diff.cpp   LINE: 747
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> STACK TRACE <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Backtrace with 7 stack frames.
breseq(+0x5aab1) [0x55a56ee7dab1]
breseq(+0x10dbeb) [0x55a56ef30beb]
breseq(+0x20ebdd) [0x55a56f031bdd]
breseq(+0x91ebd) [0x55a56eeb4ebd]
breseq(+0x4ec2f) [0x55a56ee71c2f]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2ace6672a505]
breseq(+0x59e99) [0x55a56ee7ce99]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
jeffreybarrick commented 1 year ago

Hi, I just ran into this exact problem myself a few weeks ago. It's now fixed in the current repository version and will be in release v0.37.1 when I'm able to get that out in a week or two.

xiaolinchu92 commented 1 year ago

Hi Jeff, Thank you for the quick reply. I got the same error using the current v0.37.0 version. Will try the newer verion when it is released. Thanks!

jeffreybarrick commented 1 year ago

This should be fixed in v0.37.1. .

You should be able to use the existing merged GD file with no problems.

Let me know if the issue persists.