flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

ERROR: something went wrong when modifying the GenBank flat file #29

Closed mattbawn closed 4 years ago

mattbawn commented 4 years ago

Hi Florent,

Thank you for your previous fix(es).

I am getting the following:

This is Pantagruel pipeline version 4867c048788ba7ec92dfd5ae9148d0349411151c using source code from repository '/opt/software/pantagruel'
# will run tasks: 0 1 2 3 4 5 6 7 8 9
[2019-11-15 11:39:09] Pantagruel pipeline task 0: fetch public genome data from NCBI sequence databases and annotate private genomes.
Create new task folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/00.input_data'
[2019-11-15 11:39:10] extract assembly data from folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/genomes'
found 134 contig files (raw genome assemblies) in /nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/genomes/contigs/
Warning: 'prokka' command was not available from the PATH; this may be fine, as long as none of you custome genomes need annotating
skip building the reference BLAST db
[2019-11-15 11:39:11] ragout_100
found annotation folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/genomes/annotation/ragout_100' ; skip annotation of contigs in '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/genomes/contigs/ragout_100.fa'
fix annotation to integrate region information into GFF files
fix annotation to integrate taxid information into GBK files

Traceback (most recent call last):
  File "/opt/software/pantagruel/scripts/add_taxid_feature2prokkaGBK.py", line 27, in <module>
    straininfo = dstraininfo[strain]
KeyError: 'strain'
ERROR: something went wrong when modifying the GenBank flat file /nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/00.input_data/annotation/ragout_100/PROKKA_11122019.gbf
ERROR: Pantagruel pipeline task 0: failed.

Is Pantagruel expecting the strain info in strain_infos_database.txt to also be in the annotation?

If so can I use Pantagruel to annotate the assemblies? (this may be easier as it seems to be quite specific with certain file expectations)

Thanks,

Matt

flass commented 4 years ago

Hi Matt,

yes, Pantagruel expects a strain_infos_database.txt file whenever you provide custom assemblies with option -a, not only if you annotate them within Pantagruel pipeline. The pipeline relies on this file to tag the genomes and link basic metadata (species and strain name).

And yes indeed you can annotate the genomes directly within the pipeline, as long as you have prokka command available in your path (best having prokka v1.14; v1.13 should work too)

mattbawn commented 4 years ago

Hi Florent,

I'm sorry, perhaps I wasn't clear,

"Is Pantagruel expecting the strain info in strain_infos_database.txt to also be in the annotation?" I mean, is the strain tag in strain_infos_database.txtexpected to be present in the *.GFF and '*.GBK` files?

Thanks,

Matt

flass commented 4 years ago

OK, I get you. Yes the script add_taxid_feature2prokkaGBK.py was expecting that the strain tag in strain_infos_database.txt be present in the *.GBK file; it's not the case anymore with commit 7f68302. Please try see if it goes.

mattbawn commented 4 years ago

Hi again,

Thanks for doing this. I now get:

This is Pantagruel pipeline version 37ff955f16336d331307812ee5c6231b6b793600 using source code from repository '/opt/software/pantagruel'
# will run tasks: 0 1 2 3 4 5 6 7 8 9
[2019-11-28 13:17:34] Pantagruel pipeline task 0: fetch public genome data from NCBI sequence databases and annotate private genomes.
Create new task folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/00.input_data'
/opt/software/pantagruel/scripts/pipeline/pantagruel_pipeline_00_fetch_data.sh: line 488: syntax error: unexpected end of file
ERROR: Pantagruel pipeline task 0: failed.

Thanks

Matt

flass commented 4 years ago

Oh that was bad of me not to test this after recent changes.... I introduced the bug in commit 0368237. This is now fixed with feaa9e0. Thank you for reporting.