LANL-Bioinformatics / EDGE

EDGE is a highly adaptable bioinformatics platform that allows laboratories to quickly analyze and interpret genomic sequence data.
https://lanl-bioinformatics.github.io/EDGE/
GNU General Public License v3.0
73 stars 31 forks source link

prokka contig name length error #16

Closed donutbrew closed 7 years ago

donutbrew commented 8 years ago

For longer project names, the idba contig names cause problems for prokka. (devel branch)

Here is and example:

from the Annotation.log:

     parallel: Warning: $SHELL not set. Using /bin/sh.
     tbl2asn.orig -
     [09:41:18] Please rename your contigs or use --centre XXX to generate clean contig names.

from the other Annotation log file:

[10:03:15] Command: --quiet --force --locustag B227_CARI_HaptRight_16_0003_RAB_DNA_Covaris --prefix B227_CARI_HaptRight_16_0003_RAB_DNA_Covaris --cpus 8 --outdir /opt/edge/edge_ui/EDGE_output/58b7d067c00e01d4c957c4610495dbcc/AssemblyBasedAnalysis/Annotation --kingdom Viruses /opt/edge/edge_ui/EDGE_output/58b7d067c00e01d4c957c4610495dbcc/AssemblyBasedAnalysis/B227_CARI_HaptRight_16_0003_RAB_DNA_Covaris_contigs_700up.fa
[10:03:15] Looking for 'aragorn' - found /opt/edge/bin/aragorn
[10:03:15] Determined aragorn version is 1.2
[10:03:15] Looking for 'barrnap' - found /opt/edge/bin/barrnap
[10:03:15] Determined barrnap version is 0.4
[10:03:15] Looking for 'blastp' - found /opt/edge/bin/blastp
[10:03:15] Determined blastp version is 2.2
[10:03:15] Looking for 'cmpress' - found /opt/edge/bin/cmpress
[10:03:15] Determined cmpress version is 1.1
[10:03:15] Looking for 'cmscan' - found /opt/edge/bin/cmscan
[10:03:15] Determined cmscan version is 1.1
[10:03:15] Looking for 'egrep' - found /bin/egrep
[10:03:15] Looking for 'find' - found /bin/find
[10:03:15] Looking for 'grep' - found /bin/grep
[10:03:15] Looking for 'hmmpress' - found /opt/edge/bin/hmmpress
[10:03:15] Determined hmmpress version is 3.1
[10:03:15] Looking for 'hmmscan' - found /opt/edge/bin/hmmscan
[10:03:15] Determined hmmscan version is 3.1
[10:03:15] Looking for 'less' - found /bin/less
[10:03:15] Looking for 'makeblastdb' - found /opt/edge/bin/makeblastdb
[10:03:15] Determined makeblastdb version is 2.2
[10:03:15] Looking for 'minced' - found /opt/edge/thirdParty/prokka-1.11/bin/../binaries/linux/../common/minced
[10:03:16] Determined minced version is 1.6
[10:03:16] Looking for 'parallel' - found /opt/edge/bin/parallel
[10:03:16] Determined parallel version is 20140622
[10:03:16] Looking for 'prodigal' - found /opt/edge/bin/prodigal
[10:03:16] Determined prodigal version is 2.60
[10:03:16] Looking for 'prokka-genbank_to_fasta_db' - found /opt/edge/thirdParty/prokka-1.11/bin/../binaries/linux/../../bin/prokka-genbank_to_fasta_db
[10:03:16] Looking for 'sed' - found /bin/sed
[10:03:16] Looking for 'tbl2asn' - found /opt/edge/scripts/tbl2asn
[10:03:16] Determined tbl2asn version is 24.9
[10:03:16] Using genetic code table 1.
[10:03:16] Loading and checking input file: /opt/edge/edge_ui/EDGE_output/58b7d067c00e01d4c957c4610495dbcc/AssemblyBasedAnalysis/B227_CARI_HaptRight_16_0003_RAB_DNA_Covaris_contigs_700up.fa
[10:03:16] Contig ID must <= 37 chars long: B227_CARI_HaptRight_16_0003_RAB_DNA_Covaris_0000
[10:03:16] Please rename your contigs or use --centre XXX to generate clean contig names.
chienchi commented 8 years ago

screenshot 2016-08-23 14 56 44

This is a known issue. We did set up the max input length for project name at 30 of the EDGE GUI. Did you ran the job through command line?

donutbrew commented 8 years ago

Good point. It was from Batch input.

chienchi commented 8 years ago

Thanks!. Added the length check and warning message. commit bb5405013349c2f658515eb22b117e13497e15de