barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

Error: Directory nonexistent. Code 512 #350

Closed RpfR2000 closed 1 year ago

RpfR2000 commented 1 year ago

Hi -- first off, thanks for making this software publicly available! It is very cool.

I'm getting the following error:

sh: 1: cannot create /output/evidence/512.r.log: Directory nonexistent Creating coverage plot for region: contig_36:1-2780 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Error running command: [system] R --vanilla < "/plot_coverage.r" > "/output/evidence/512.r.log" --args in_file="/output/evidence/512_1.coverage.tab" out_file="/output/evidence/contig_1.overview.png" pdf_output=0 total_only=0 window_start=1 window_end=738339 avg_coverage=0.0 fixed_coverage_scale=0 Result code: 512 FILE: libbreseq/common.h LINE: 1411

This error occurred during mutation annotation in polymorphism mode (-p, -o, -j were the only flags I used in the original command). I tried running the command that is causing the error a as a standalone and it worked fine, which might mean it is an issue with my installation of breseq. I'm using bowtie2 version 2.4.5, R version 4.2.3, and breseq version 0.38.1.

On a side note, I am getting a few warnings. Should I be concerned about these? They only apply to a few contigs. I'm afraid I'm not informed enough about genomics to know whether these are a problem:

Failed to fit coverage distribution for some reference sequences. This may degrade the quality of predicting mutations from new sequence junctions (JC evidence). You may want to set --deletion-coverage-propagation-cutoff to improve the quality of deletion prediction (MC evidence).

Insufficient coverage to call mutations for some reference sequences. Set either the --targeted-sequencing or --contig-reference option if you want mutations called on these reference sequences.

Thanks!

jeffreybarrick commented 1 year ago

I can think of a few possibilities for the error... 1) You ran out of hard drive space in the middle of the breseq run. 2) You ran breseq in different ways as users with different permissions on the same output folder.

Probably, it would be best to delete all of the output and start again from the beginning.

To make sure your environment is consistent, you might also try installing breseq through Bioconda.

The errors about failing to fit the coverage distribution and having low coverage seem to indicate that you either don't have enough reads for breseq to call mutations, or you are using a reference sequence that is not sufficiently similar to the genome of the microbe you sequenced for reads to be mapped.

RpfR2000 commented 1 year ago

Got it, thanks! I wouldn't be surprised if its the first option, as the run took ~12 hours. Is that normal for polymorphic mode? I ran the same command with consensus mode and it finished in ~1-1.5 hours.

jeffreybarrick commented 1 year ago

That's a really big discrepancy...I think it points to something being unexpected about the data–probably that is is very divergent from the reference genome.

Is there good coverage of the reference genome when you look at the summary output for consensus mode?

RpfR2000 commented 1 year ago

Okay, so the timing issue was entirely due to the output directory. When I ran in consensus mode, I let the output be the root directory, whereas for some reason I set the output directory to a data drive when using polymorphism mode. They're roughly equivalent in terms of computation time when outputting to the same directory. Coverage is really good (>95%) so that wasn't the issue. Sorry for the confusion!