geronimp / enrichM

Toolbox for comparative genomics of MAGs
81 stars 22 forks source link

Index out of range error during enrichm annotate #120

Closed mhyleung closed 3 years ago

mhyleung commented 3 years ago

Dear all

I ran a simple enrichm annotate test with two genomes in .fna formats, and I received the following error:

(enrichm) [mhyleung@e2gpu001 enrichm]$ enrichm annotate --output /disk/rdisk08/mhyleung/shotgun2/enrichm/output_directory_new --genome_directory /disk/rdisk08/mhyleung/shotgun2/enrichm/input_bin_genome/test --ko_hmm --orthologs --threads 24 --log /disk/rdisk08/mhyleung/shotgun2/enrichm/log --force --verbosity 5

[2021-02-03 07:54:50 AM] INFO: Command: /disk/rdisk08/mhyleung/miniconda2/envs/enrichm/bin/enrichm annotate --output /disk/rdisk08/mhyleung/shotgun2/enrichm/output_directory_new --genome_directory /disk/rdisk08/mhyleung/shotgun2/enrichm/input_bin_genome/test --ko_hmm --orthologs --threads 24 --log /disk/rdisk08/mhyleung/shotgun2/enrichm/log --force --verbosity 5
[2021-02-03 07:54:50 AM] INFO: Running the annotate pipeline
[2021-02-03 07:54:50 AM] INFO: Running pipeline: annotate
[2021-02-03 07:54:50 AM] INFO: Setting up for genome annotation
[2021-02-03 07:54:50 AM] INFO: Calling proteins for annotation
[2021-02-03 07:54:50 AM] INFO:     - Calling proteins for 2 genomes
[2021-02-03 07:54:50 AM] DEBUG: ls /disk/rdisk08/mhyleung/shotgun2/enrichm/input_bin_genome/test/*.fna |                     sed 's/.fna//g' |                     grep -o '[^/]*$' |                     parallel -j 5                         prodigal                             -q                             -p meta                             -o /dev/null                             -d /disk/rdisk08/mhyleung/shotgun2/enrichm/output_directory_new/genome_genes/{}.fna                             -a /disk/rdisk08/mhyleung/shotgun2/enrichm/output_directory_new/genome_proteins/{}.faa                             -i /disk/rdisk08/mhyleung/shotgun2/enrichm/input_bin_genome/test/{}.fna                             > /dev/null 2>&1
[2021-02-03 07:55:25 AM] DEBUG: Finished
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/annotate.py", line 28, in parse_genomes
    genome = Genome(*params)
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/genome.py", line 44, in __init__
    sequence = Sequence(protein_description, protein_sequence, gene_sequence)
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/genome.py", line 245, in __init__
    = [x.split('=')[1] for x in stats.split(';')]
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/genome.py", line 245, in <listcomp>
    = [x.split('=')[1] for x in stats.split(';')]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/bin/enrichm", line 352, in <module>
    run.run_enrichm(args, sys.argv)
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/run.py", line 409, in run_enrichm
    pipeline(args)
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/run.py", line 300, in run_annotate
    annotate.annotate_pipeline(args.genome_directory,
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/annotate.py", line 755, in annotate_pipeline
    genomes_list = self.parse_genome_inputs(genome_directory, protein_directory,
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/site-packages/enrichm/annotate.py", line 737, in parse_genome_inputs
    genomes_list += self.pool.map(parse_genomes, chunk)
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/disk/rdisk08/mhyleung/miniconda2/envs/enrichm/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: list index out of range

If anyone can point me to the right direction to fixing this that would be great! Thank you so much!

Regards

Marcus

geronimp commented 3 years ago

Hi Marcus,

Thanks for your error report - Just tried your command and I cant replicated the error. Does the directory you provided to --genome_directory contain genomes that end with .fna?

mhyleung commented 3 years ago

Dear geronimp

I think that was an issue with my input files. I have changed it and it seems fine now. I will now close this thread thanks!