Arkadiy-Garber / MagicLamp

A platform for targeted annotation of (meta)genomic and (meta)transcriptomic datasets using HMM sets.
GNU General Public License v3.0
13 stars 5 forks source link

Error when running LithoGenie #11

Open seanmcallister opened 1 year ago

seanmcallister commented 1 year ago

Error code from run through conda installation. Input was a faa file with ORFs from Anvio.

(magiclamp) sean-mini-server:2022_PCC_ManuscriptWork sean_server$ MagicLamp.py LithoGenie -bin_dir bins -bin_ext faa -out heme_LithoGenie -t 4 --orfs --makeplots --all_results checking arguments . . . All required arguments provided!

reading in HMM bitscore cut-offs... ... starting main pipeline... analyzing All_proteins_fullannotation_HEMEGENES.faa: 99%
Identifying genomic proximities and putative operons Traceback (most recent call last): File "/Users/sean_server/software/MagicLamp/MagicLamp.py", line 30, in LithoGenie.main() File "/Users/sean_server/software/MagicLamp/genies/LithoGenie.py", line 954, in main CoordDict[i][contig].append(int(numOrf)) ValueError: invalid literal for int() with base 10: 'JV1'

seanmcallister commented 1 year ago

Same error essentially in MnGenie:

(magiclamp) sean-mini-server:2022_PCC_ManuscriptWork sean_server$ MagicLamp.py MnGenie -bin_dir bins -bin_ext faa -out heme_MnGenie -t 4 --orfs --makeplots
checking arguments
.
.
.
All required arguments provided!

starting main pipeline...
analyzing All_proteins_fullannotation_HEMEGENES.faa: 102%   
Identifying genomic proximities and putative operons
Traceback (most recent call last):
  File "/Users/sean_server/software/MagicLamp/MagicLamp.py", line 42, in <module>
    MnGenie.main()
  File "/Users/sean_server/software/MagicLamp/genies/MnGenie.py", line 834, in main
    CoordDict[i][contig].append(int(numOrf))
ValueError: invalid literal for int() with base 10: 'Zeta11'
Arkadiy-Garber commented 1 year ago

The issue here is that the MagicLamp genies all expect protein files formatted like they are by Prodigal or Prokka: a contig identifier followed by an underscore and integer, which indicates relative position on each contig. This is why the preferred input is a contigs file that will be used as input to a Prodigal run. The genies will accept protein.faa files, but only if their headers are formatted in this specific way

When I made this tool, I didn't consider the fact that many users may have previously-produced and annotated proteins for analysis. I am considering doing a major overhaul to that part of the algorithm to allow for submission of proteins regardless of header format, but have not had time to do this yet.