flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

ERROR: Pantagruel pipeline task 1: failed. #31

Closed mattbawn closed 4 years ago

mattbawn commented 4 years ago

Hi Florent,

Task 0 now completes succesfully but I get the following:

Pantagruel pipeline task 0: complete.
[2019-11-29 12:14:31] Pantagruel pipeline task 1: classify protein sequences into homologous families.
Create new task folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/01.seqdb'
[2019-11-29 12:14:57] -- 134 proteomes in dataset
[2019-11-29 12:14:58] -- 623854 proteins in dataset
[2019-11-29 12:15:09] -- 623854 non-redundant protein ids in dataset
                      -- Perform first protein clustering step (100% prot identity clustering with clusthash algorithm)
Traceback (most recent call last):
  File "/opt/software/pantagruel/scripts/extract_metadata_from_gbff.py", line 372, in <module>
    main(nfldirassemb, dirassemblyinfo, output, defspename, nfdhandmetaraw, nfdhandmetacur, nfdhanddbxref)
  File "/opt/software/pantagruel/scripts/extract_metadata_from_gbff.py", line 218, in main
    taxid = dict(dbxref.split(':') for dbxref in dmetadata.get('db_xref',{}).get(assemb,na).strip(' "').split(';'))['taxon']
ValueError: dictionary update sequence element #0 has length 1; 2 is required
/opt/software/pantagruel/scripts/pipeline/pantagruel_pipeline_01_homologous_seq_families.sh: line 44:  9138 Illegal instruction     mmseqs createdb ${allfaarad}.nrprotids.faa ${allfaarad}.mmseqsdb &> ${mmlog0}
/opt/software/pantagruel/scripts/pipeline/pantagruel_pipeline_01_homologous_seq_families.sh: line 45:  9139 Illegal instruction     mmseqs clusthash --min-seq-id 1.0 ${allfaarad}.mmseqsdb ${allfaarad}.clusthashdb_minseqid100 &>> ${mmlog0}
/opt/software/pantagruel/scripts/pipeline/pantagruel_pipeline_01_homologous_seq_families.sh: line 46:  9140 Illegal instruction     mmseqs clust ${allfaarad}.mmseqsdb ${allfaarad}.clusthashdb_minseqid100 ${allfaarad}.clusthashdb_minseqid100_clust &>> ${mmlog0}
/opt/software/pantagruel/scripts/pipeline/pantagruel_pipeline_01_homologous_seq_families.sh: line 48:  9144 Illegal instruction     mmseqs createseqfiledb ${allfaarad}.mmseqsdb ${allfaarad}.clusthashdb_minseqid100_clust ${allfaarad}.clusthashdb_minseqid100_clusters &>> ${mmlog0}
ERROR: First protein clustering step failed; please inestigate error reports in '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/logs/mmseqs/mmseqs-0-identicalprot-clusthash.log'
ERROR: Pantagruel pipeline task 1: failed.

However /nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database/logs/mmseqs/mmseqs-0-identicalprot-clusthash.log' is empty.

Thanks,

Matt

flass commented 4 years ago

Hi Matt,

I think there is here a mix of errors:

1) the first block from python script extract_metadata_from_gbff.py actually belongs to task 00 but was not caught. It seems to come from a missing /db_xref="taxon:xxx" field in one of the GenBank file. This field, which refers to the NCBI taxid, should be automatically added earlier in the automated annotation process - unless you provide your ownGenBank file and this field happens to be missing. In commit db0409b, I changed the logging to provide more detail and made the pipeline to catch an error in extract_metadata_from_gbff.py. Maybe you can re-run task 00 to sse what's wrong. Ultimately, I could easily provide a default value for the taxid when not documented, but it is not super clean. please let me know.

2) Illegal instruction is a dirty error message you don't want to see and means there is something wrong with your machine running mmseqs, irrespective of the shell and pipeline around. Can you check mmseqs is correctly installed and up to date? it should be so through Homebrew. You can dig installation details with:

mmseqs -h | grep Version
which mmseqs
brew search mmseqs2
brew info mmseqs2

if it keeps throwing that kind of error, I suggest you compile the MMseqs program yourself from source (easy), available here: https://github.com/soedinglab/MMseqs2

flass commented 4 years ago

I suggest you now use the Dockerfile to build a docker image, which should fix this kind of runtime problem, see #11 .