Closed juliofdiaz closed 3 years ago
Can you post the first few lines of the GBF and GFF files? breseq looks for the seq_id
there, and it seems to be encountering something unexpected.
LOCUS EP67_contig_1 40698 bp DNA linear BCT 26-APR-2021
DEFINITION Pseudomonas aeruginosa chromosome, whole genome shotgun sequence.
ACCESSION
VERSION
KEYWORDS WGS.
SOURCE Pseudomonas aeruginosa
ORGANISM Pseudomonas aeruginosa
Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales;
Pseudomonadaceae; Pseudomonas.
REFERENCE 1 (bases 1 to 40698)
AUTHORS Diaz Caballero,J.F.
CONSRTM COMBACTE_MAGNET
TITLE Direct Submission
JOURNAL Submitted (26-APR-2021) Department of Zoology, University of
Oxford, 11a Mansfield Road, Oxford, Oxfordshire OX2 9DF, UK
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region 1 40698
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=287
EP67_contig_1 Local region 1 40698 . + . ID=EP67_contig_1:1..40698;Dbxref=taxon:287;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA
EP67_contig_1 . gene 1 94 . + . ID=gene-pgaptmp_000001;Name=pgaptmp_000001;gbkey=Gene;gene_biotype=protein_coding;locus_tag=pgaptmp_000001;partial=true;start_range=.,1
EP67_contig_1 Protein Homology CDS 1 94 . + 1 ID=cds-pgaptmp_000001;Parent=gene-pgaptmp_000001;Name=extdb:pgaptmp_000001;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_000679427.1;locus_tag=pgaptmp_000001;partial=true;product=QacE family quaternary ammonium compound efflux SMR transporter;protein_id=extdb:pgaptmp_000001;start_range=.,1;transl_table=11
EP67_contig_1 . gene 88 927 . + . ID=gene-pgaptmp_000002;Name=sul1;gbkey=Gene;gene=sul1;gene_biotype=protein_coding;locus_tag=pgaptmp_000002
EP67_contig_1 Protein Homology CDS 88 927 . + 0 ID=cds-pgaptmp_000002;Parent=gene-pgaptmp_000002;Name=extdb:pgaptmp_000002;gbkey=CDS;gene=sul1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_000259031.1;locus_tag=pgaptmp_000002;product=sulfonamide-resistant dihydropteroate synthase Sul1;protein_id=extdb:pgaptmp_000002;transl_table=11
EP67_contig_1 . gene 1055 1555 . + . ID=gene-pgaptmp_000003;Name=pgaptmp_000003;gbkey=Gene;gene_biotype=protein_coding;locus_tag=pgaptmp_000003
EP67_contig_1 Protein Homology CDS 1055 1555 . + 0 ID=cds-pgaptmp_000003;Parent=gene-pgaptmp_000003;Name=extdb:pgaptmp_000003;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_004194535.1;locus_tag=pgaptmp_000003;product=GNAT family N-acetyltransferase;protein_id=extdb:pgaptmp_000003;transl_table=11
Ok. It looks like this was fixed/addressed for the GenBank format in #252.
For now, the workaround would be to delete the empty VERSION and ACCESSION lines from the GenBank. This edited GenBank file should work with v0.35.5.
I have a fix in the current repository code for the GenBank problem, but the GFF3 case still fails. After I add a fix for that I will roll a new v0.35.6 version. With that you should be able to use the PGAP-generated files directly in breseq without needing to edit them.
I removed VERSION and accession, so gsk looks like this:
LOCUS EP67_contig_1 40698 bp DNA linear BCT 26-APR-2021
DEFINITION Pseudomonas aeruginosa chromosome, whole genome shotgun sequence.
KEYWORDS WGS.
SOURCE Pseudomonas aeruginosa
ORGANISM Pseudomonas aeruginosa
Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales;
Pseudomonadaceae; Pseudomonas.
REFERENCE 1 (bases 1 to 40698)
AUTHORS Diaz Caballero,J.F.
and I get this error:
---> bowtie2 :: version 2.2.6 [/system/software/linux-x86_64/bowtie2/2.2.6/bin/bowtie2]
---> R :: version 3.5.2 [/system/software/centos_7/R/3.5.2__intel-2018/bin/R]
+++ NOW PROCESSING Read and reference sequence file input
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Could not determine format of reference file: /data/zool-combacte-magnet/zool2417/results/pgap/pb1-ca-ci-pi/EP67/EP67_2.gbk
FILE: reference_sequence.cpp LINE: 843
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> STACK TRACE <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Backtrace with 7 stack frames.
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x577f78]
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x5900b2]
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x590285]
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x42dbc0]
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x40af6b]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd189f73505]
/data/zool-combacte-magnet/zool2417/bin/breseq-0.35.5/bin/breseq() [0x41ce47]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I'm not sure what's going on in that case. When I make a header like that one (but faked onto a different sequence file) it works in v0.35.6.
Can you try this new version with your original GenBank file? https://github.com/barricklab/breseq/releases/tag/v0.35.6
If it doesn't work, I might need you to email me the full GenBank file at the address in the breseq command line output header.
Thanks!
Thanks for looking into this. I can report that v 0.35.6 fixes the problem.
Thanks!
I have annotated a genome using NCBI's PGAP. Then I tried using the resulting gff or gbk from this annotation as input for breseq (v 0.35.5), and I get the following errors:
GBK
GFF
I do not get the error if I use the pasta file with the genome. Any clues as to what the problem is? When I use the prokka gff or gbk, I also do not get the error.