apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
189 stars 17 forks source link

KeyError: 'provirus_names' when running genomad score_calibration #8

Closed rdenise closed 1 year ago

rdenise commented 1 year ago

I encountered the following error when running genomad score_calibration:

Traceback (most recent call last):
  File "/home/remi/miniconda3/envs/genomad/bin/genomad", line 10, in <module>
    sys.exit(cli())
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1074, in end_to_end
    ctx.invoke(
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 675, in score_calibration
    genomad.score_calibration.main(input, output, composition, force_auto, verbose)
  File "/home/remi/miniconda3/envs/genomad/lib/python3.10/site-packages/genomad/modules/score_calibration.py", line 316, in main
    len(score_dict["contig_names"]) + len(score_dict["provirus_names"])
KeyError: 'provirus_names'
"

I was running the following command:

genomad end-to-end -t 30 --composition metagenome --enable-score-calibration ZSM005_contigs.filtered.sorted.fasta ZSM005_contigs_score_calibration genomad_db

I am using genomad version 1.3.2 installed using conda.

I can provide the fasta file ZSM005_contigs.filtered.sorted.fasta if it would be helpful in troubleshooting this error.

apcamargo commented 1 year ago

I thought I had fixed this bug. I'll look into it. Can you share the data with me? It would be really helpful.

rdenise commented 1 year ago

Would you happen to have an email address I can send you the fasta file?

apcamargo commented 1 year ago

I found the underlying problem and will push a fix shortly. The error happened when the score calibration was executed on a sample where proviruses were not identified.

That said, I had to use a different dataset to find the issue because geNomad did find a provirus in your data. This shouldn't happen if we used the same parameters. I'll follow up by email.