barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

Error using nanopore reads as input #370

Open Cat-Jane opened 2 months ago

Cat-Jane commented 2 months ago

Hi,

Trying to use nanopore reads as input using the latest version of breseq (thank you very much for developing this feature), however, I get the following error:

terminate called after throwing an instance of 'std::out_of_range' what(): key 'reads_were_split' not found

I'm guessing it's not splitting up the long reads but I'm not sure what to try to fix it or if I'm inputing something incorrectly. Full output is below.

Any help would be very much appreciated. Thanks!

Terminal output, using conda to install and run breseq through a WSL2 platform: (breseq_v0.38.3) catherine@DESKTOP-D76FJMV:~/Kp_oqxR_operon_seqs$ breseq -p --nanopore -j 8 -o Cam1_np_breseq -r KP6870155_oqx_operon.gbk Cam1_barcode05_calls.fastq "================================================================================ breseq 0.38.3 http://barricklab.org/breseq

Active Developers: Barrick JE, Deatherage DE Contact: jeffrey.e.barrick@gmail.com

breseq is free software; you can redistribute it and/or modify it under the terms the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

Copyright (c) 2008-2010 Michigan State University Copyright (c) 2011-2022 The University of Texas at Austin

If you use breseq in your research, please cite:

Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151: 165–188.

If you use structural variation (junction) predictions, please cite:

Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C., Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G. (2014) Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq. BMC Genomics 15:1039. "================================================================================ ---> bowtie2 :: version 2.4.5 [/home/catherine/anaconda3/envs/breseq_v0.38.3/bin/bowtie2] ---> R :: version 4.3.3 [/home/catherine/anaconda3/envs/breseq_v0.38.3/bin/R] --- ALREADY COMPLETE Read and reference sequence file input terminate called after throwing an instance of 'std::out_of_range' what(): key 'reads_were_split' not found Aborted

jeffreybarrick commented 2 months ago

Hi! I'll need to look at your input read/reference files to investigate what is going wrong. Are you able to share them? I can share a folder to upload to if that's helpful.

Cat-Jane commented 2 months ago

I think I figured out the problem. I'm looking at a specific region that is ~6 kb, hence my reference is only ~6 kb. When I did some qc on my reads I discovered that there were some that were longer than that (not sure why, will have to look into it), so I filtered out reads greater than my reference size, then re-ran the above command with the filtered read set and it worked. Sorry, should have QC'd the reads first. Thank you for replying so quickly!

jeffreybarrick commented 2 months ago

Glad you got it working on your data.

I'll still check on this, as breseq should be splitting the reads to be smaller chunks, and I wouldn't expect this to cause a crash.

Cat-Jane commented 2 months ago

I'm happy to supply my raw reads and reference if that would be helpful? A folder to upload to would be great if so.

jeffreybarrick commented 2 months ago

Yes, it would be helpful, just to make sure I can reproduce the problem.

Please email me at the address in the breseq run header, so I can share a folder with your email.