geronimp / enrichM

Toolbox for comparative genomics of MAGs
80 stars 22 forks source link

gene naming issue #91

Open wwood opened 4 years ago

wwood commented 4 years ago

I guess this is caused by there being a ~ character in some gene name, as well as separating the genome from the gene name?

enrichm annotate --genome_directory dereplicated_representatives_fasta/ --parallel 20 --ko --ko_hmm     
[2019-10-28 13:42:57 PM] INFO: Command: /srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm annotate --genome_directory dereplicated_representatives_fasta/ --parallel 20 --ko --ko_hmm                          
[2019-10-28 13:42:57 PM] INFO: Running the annotate pipeline
[2019-10-28 13:42:57 PM] INFO: Running pipeline: annotate
[2019-10-28 13:42:57 PM] INFO: Setting up for genome annotation
[2019-10-28 13:42:57 PM] INFO: Calling proteins for annotation
[2019-10-28 13:42:57 PM] INFO:     - Calling proteins for 716 genomes
[2019-10-28 20:00:27 PM] INFO: Starting annotation:
[2019-10-28 20:00:27 PM] INFO:     - Annotating genomes with ko ids using DIAMOND
[2019-10-28 20:00:27 PM] INFO:     - BLASTing genomes
Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm", line 374, in <module>
    r.run_enrichm(args, sys.argv)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/run.py", line 359, in run_enrichm                                                                                            
    args.protein_files)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 815, in annotate_pipeline                                                                                 
    self.GENOME_KO)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 256, in annotate_diamond                                                                                  
    for genome_name, batch in self.get_batches(output_annotation_path):
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 277, in get_batches                                                                                       
    genome_id, _ = split_line[0].split('~')
ValueError: too many values to unpack (expected 2)
geronimp commented 4 years ago

Hey,

Thanks for this report. Yep looks like a straight up bug - I'll change so it only splits on the first '~'