barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
142 stars 21 forks source link

trouble with test data set, error: libc++abi.dylib: terminating with uncaught exception of type std::domain_error: type must be number, but is null Abort trap: 6 #169

Closed kvanraay closed 6 years ago

kvanraay commented 6 years ago

Hello,

I'm new to breseq. I downloaded it from github, unzipped it and dragged and dropped it from my downloads folder to the folder I wanted it in (don't judge - I'm also new to bioinformatics and I had trouble navigating to my $PATH without being able to use miniconda). Within the breseq folder I created the test_drive folder and stored the 3 files: NC_012967.gbk, SRR030257_1.fastq, SRR030257_2.fastq. Note, I tried both the latest version of breseq (breseq-0.33.0-MacOSX-10.9+.tar.gz) and the most recent previous version (breseq-0.32.1-MacOSX-10.9.tar.gz).

I executed the test script using:

D-10-18-243-161:test_drive IamUnicorn$  ../bin/breseq /Users/IamUnicorn/Documents/Grad\ School/Projects/Emulsion/IlluminaSequences/breseq/test_drive/ -r NC_012967.gbk SRR030257_1.fastq SRR030257_2.fastq

And this is the output I got:

================================================================================
breseq 0.33.0  revision 5b4d9b78ed41   http://barricklab.org/breseq

Active Developers: Barrick JE, Deatherage DE
Contact:           <jeffrey.e.barrick@gmail.com>

breseq is free software; you can redistribute it and/or modify it under the
terms the GNU General Public License as published by the Free Software 
Foundation; either version 2, or (at your option) any later version.

Copyright (c) 2008-2010 Michigan State University
Copyright (c) 2011-2017 The University of Texas at Austin

If you use breseq in your research, please cite:

  Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations
  in laboratory-evolved microbes from next-generation sequencing
  data using breseq. Methods Mol. Biol. 1151: 165–188.

If you use structural variation (junction) predictions, please cite:

  Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C.,
  Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G. 
  (2014) Identifying structural variation in haploid microbial genomes 
  from short-read resequencing data using breseq. BMC Genomics 15:1039.
================================================================================
---> bowtie2  :: version 2.3.4.1 [/Users/IamUnicorn/miniconda3/bin/bowtie2]
---> R        :: version 3.5.1 [/usr/local/bin/R]
+++   NOW PROCESSING Read and reference sequence file input
  READ FILE::
    Converting/filtering FASTQ file...
    Original base quality format: ILLUMINA_1.3+ New format: SANGER
    Original reads: 0 bases: 0
    Filtered reads: none
    Analyzed reads: 0 bases: 0
  READ FILE::SRR030257_1
    Converting/filtering FASTQ file...
    Original base quality format: SANGER New format: SANGER
    Original reads: 3800180 bases: 136806480
    Filtered reads:      19 bases:       684 (≥50.0% bases N)
    Filtered reads:   60951 bases:   2194236 (≥90.0% same base)
    Analyzed reads: 3739210 bases: 134611560
  READ FILE::SRR030257_2
    Converting/filtering FASTQ file...
    Original base quality format: SANGER New format: SANGER
    Original reads: 3800180 bases: 136806480
    Filtered reads:     369 bases:     13284 (≥50.0% bases N)
    Filtered reads:   47630 bases:   1714680 (≥90.0% same base)
    Analyzed reads: 3752181 bases: 135078516
  ::TOTAL::
    Original reads: 7600360 bases: 273612960
    Analyzed reads: 7491391 bases: 269690076
[samtools] faidx ./data/reference.fasta
  REFERENCE: NC_012967
  LENGTH: 4629812
libc++abi.dylib: terminating with uncaught exception of type std::domain_error: type must be number, but is null
Abort trap: 6

I would appreciate any advice or pointers.

Thanks very much.

kvanraay commented 6 years ago

Update: I put breseq/bin into my $PATH and ran

breseq -r NC_012967.gbk SRR030257_1.fastq SRR030257_2.fastq

I'm running the test_drive data. It's taking a while, but it seems to be working -the first empty READ FILE from my previous run is no longer there.

kvanraay commented 6 years ago

OK, so I've fixed my problem, which appears to have stemmed from an improper installation. Running the test data just fine now. The main takeaways as a terminal/bash noob: After unzipping the breseq download, move to the directory where both R and bowtie are (in my case, this was /usr/local/bin/miniconda3/bin then pointed to the breseq program using export PATH=PATH$:/usr/local/bin/miniconda3/bin/breseq-0.33.0-MacOSX-10.9+/bin {my awful $PATH here}

you can find your awful path by typing echo $PATH in your terminal.

jeffreybarrick commented 6 years ago

Glad you solved it!

I think the error in your initial command could have been from not having an output path option flag -o before the directory /Users/IamUnicorn/Documents/Grad\ School/Projects/Emulsion/IlluminaSequences/breseq/test_drive/. breseq interpreted that directory as an (empty) input FASTQ file since it it didn't have an option designator in front of it. I'll have to investigate making it give a better error message for that case.

jciemniecki commented 2 years ago

Hello,

I am getting the same error message, but unfortunately none of the advice posted here so far has fixed it. Here is my command:

-> breseq -r Pseudomonas_aeruginosa_UCBPP-PA14_genome.gbk PA14_dphz12_SRR11561537.1.fastq

and the output:

---> bowtie2  :: version 2.4.5 [/Users/john/opt/anaconda3/envs/breseq/bin/bowtie2]
---> R        :: version 3.6.3 [/Users/john/opt/anaconda3/envs/breseq/bin/R]
+++   NOW PROCESSING Read and reference sequence file input
  READ FILE::PA14_dphz12_SRR11561537.1
    Converting/filtering FASTQ file...
    Original base quality format: SANGER New format: SANGER
    Original reads: 2911505 bases: 46584080
    Filtered reads: 2911505 bases: 46584080 (<18 bases long)
    Analyzed reads:       0 bases:        0
  ::TOTAL::
    Original reads: 2911505 bases: 46584080
    Analyzed reads:       0 bases:        0
[samtools] faidx ./data/reference.fasta
  REFERENCE: NC_008463
  LENGTH: 6537648
libc++abi: terminating with uncaught exception of type std::domain_error: type must be number, but is null
Abort trap: 6

I have run the test drive successfully just before this, so I believe my install/setup is correct. Any guidance is appreciated!

jeffreybarrick commented 2 years ago

Hi,

It looks like all of your reads are being filtered out, so breseq is trying to analyze zero reads.

Can you post the first few lines of your input FASTQ file PA14_dphz12_SRR11561537.1.fastq so we can see why they are being rejected?

jciemniecki commented 2 years ago

Thanks for the help. These data were downloaded from GenBank in their SRA archive format and I converted it to fastq using the default settings in their 'fastq-dump' tool in the SRA toolkit. Here are the first few lines of the file:

@SRR11561537.1.1 1 length=16
GCTCATGANGAGGATA
+SRR11561537.1.1 1 length=16
AAAAAEEE#AAAAEEE
@SRR11561537.1.2 2 length=16
GCTCATGANGAGGATA
+SRR11561537.1.2 2 length=16
AAAAAEEE#AAAAEEE
@SRR11561537.1.3 3 length=16
GCTCATGAAGAGGATA
+SRR11561537.1.3 3 length=16
AAAAAEEAAAAAAAEE
@SRR11561537.1.4 4 length=16
GCTCATGAAGAGGATA
+SRR11561537.1.4 4 length=16
AAAAAEE/AAAAAEEE
@SRR11561537.1.5 5 length=16
GCTCATGAAGAGGATA
+SRR11561537.1.5 5 length=16
AAAAAEEEAAAAAEEE
jeffreybarrick commented 2 years ago

These reads are extremely short. breseq is throwing them all out because they are under it's normal minimum length requirement of 18 bases (as it says in the command line output).

You can make that cutoff smaller if you want, so they will be used, by adding --read-min-length 0 to your command. However, I don't think you'll see very good results in terms of calling mutations with such short reads. The minimum that we've ever really used breseq with is 36 bases.

jciemniecki commented 2 years ago

OK, thanks for your help and insight!