geronimp / enrichM

Toolbox for comparative genomics of MAGs
80 stars 22 forks source link

Error when using protein sequences #92

Open ashley-isaac opened 4 years ago

ashley-isaac commented 4 years ago

Hi Joel and community,

I ran EnrichM on my MAGs and it worked fine for nucleotide fasta files (.fa). I'd like to run protein through the pipeline so I annotated the MAGs using Prokka and I tried running the .faa files on EnrichM, however, I got the below error. Any help would be appreciated.

Thanks, Ashley

(EnrichM) ai37@aduae387-lap:~/Representatives$ enrichm annotate --output alpha_rep_output/ --protein_directory genome_proteins/ --ko --pfam --threads 16 [2019-11-07 18:23:44 PM] INFO: Running command: /home/ai37/miniconda3/envs/EnrichM/bin/enrichm annotate --output alpha_rep_output/ --protein_directory genome_proteins/ --ko --pfam --threads 16 [2019-11-07 18:23:44 PM] INFO: Loading databases [2019-11-07 18:23:44 PM] INFO: Loading reference db paths [2019-11-07 18:23:44 PM] INFO: Running pipeline: annotate [2019-11-07 18:23:44 PM] INFO: Setting up for genome annotation [2019-11-07 18:23:44 PM] INFO: Using provided proteins [2019-11-07 18:23:44 PM] INFO: Preparing genomes for annotation multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 227, in init = description.split(' # ') ValueError: not enough values to unpack (expected 5, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 50, in parse_genomes genome = Genome(*params) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 67, in init sequence = Sequence(description, sequence) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 231, in init raise Exception("Error parsing genome proteins. Was the output from prodigal?") Exception: Error parsing genome proteins. Was the output from prodigal? """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ai37/miniconda3/envs/EnrichM/bin/enrichm", line 357, in r.main(args, sys.argv) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/run.py", line 323, in main args.protein_files) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 641, in do genomes_list = self.parse_genome_inputs(genome_directory, protein_directory, genome_files, protein_files) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 622, in parse_genome_inputs genomes_list += self.pool.map(parse_genomes, chunk) File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value Exception: Error parsing genome proteins. Was the output from prodigal?

geronimp commented 4 years ago

Hey Ashley, thanks for the report. EnrichM expects proteins that were specifically produced by prodigal as input so its failing here. I'll work on an enhancement that allows you to use any protein file as input, but no .gff file will be provided.

chenyy-1026 commented 3 years ago

Hi Ashley,

I ran nucleotide files on EnrichM, however, I got the below error. ValueError: not enough values to unpack. I guess you may have solved this problem. Could you help me with this error?

Mang thanks, Yuying

(enrichm_0.5.0) chenyy@super-AS-2023US-TR4:/mnt/nfs/River_meta/enrichM$ [2021-05-10 21:39:16 PM] INFO: Command: /usr/local/bin/enrichm annotate --log annotate_LOG --output OUTPUT --force --genome_directory bin --ko_hmm --threads 20 --parallel 4 [2021-05-10 21:39:16 PM] INFO: Running the annotate pipeline [2021-05-10 21:39:16 PM] INFO: Running pipeline: annotate [2021-05-10 21:39:16 PM] INFO: Setting up for genome annotation [2021-05-10 21:39:16 PM] INFO: Calling proteins for annotation [2021-05-10 21:39:16 PM] INFO: - Calling proteins for 219 genomes [2021-05-10 22:48:50 PM] INFO: Starting annotation: [2021-05-10 22:48:50 PM] INFO: - Annotating genomes with ko ids using HMMs

(enrichm_0.5.0) chenyy@super-AS-2023US-TR4:/mnt/nfs/River_meta/enrichM$ Traceback (most recent call last): File "/usr/local/bin/enrichm", line 342, in run.run_enrichm(args, sys.argv) File "/usr/local/lib/python3.8/dist-packages/enrichm/run.py", line 307, in run_enrichm annotate.annotate_pipeline(args.genome_directory, File "/usr/local/lib/python3.8/dist-packages/enrichm/annotate.py", line 825, in annotate_pipeline self.hmmsearch_annotation(genomes_list, File "/usr/local/lib/python3.8/dist-packages/enrichm/annotate.py", line 363, in hmmsearch_annotation genome.add(output_annotation_path, self.evalue, self.bit, self.aln_query, File "/usr/local/lib/python3.8/dist-packages/enrichm/genome.py", line 149, in add for seqname, annotations, evalue, annotation_range in iterator: File "/usr/local/lib/python3.8/dist-packages/enrichm/genome.py", line 493, in from_hmmsearchresults seqname, , tlen, kohmm, accession, qlen, , score, \ ValueError: not enough values to unpack (expected 22, got 4)