apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
189 stars 17 forks source link

KeyError: 'ATAACA' happened in Executing genomad marker-classification. #56

Closed YANG-Jiwei closed 10 months ago

YANG-Jiwei commented 10 months ago

Hi, I used genomad end-to-end --cleanup --splits 8 ... to run but had a error when executing genomad marker-classification.


[19:29:31] Executing genomad marker-classification.
[19:29:31] Creating the
/data/yangjiwei/XU_metagenome_analysis/genomad/genomad_Output/MEGAHIT -Run1_C4_contigs_cluster_output_Prokaryote_output_marker_classificati on directory.
Traceback (most recent call last): File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/bin/genomad", line 10, in sys.exit(cli()) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, *kwargs) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1266, in end_to_end ctx.invoke( File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 656, in marker_classification genomad.marker_classification.main( File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 503, in main ) = get_feature_array( File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 247, in get_feature_array for annotated_contig in yield_annotated_contigs( File "/home/yangjiwei/miniconda3/envs/mamba/envs/genomad/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 186, in yield_annotated_contigs annotated_contigs_dict[contig].gene_rbs.append(rbs_categories_dict[rbs]) KeyError: 'ATAACA'


I would be grateful for your answer.

pck00 commented 10 months ago

Getting the exact same error running end-to-end on some files. Seems dataset dependent.

mperisin-lallemand commented 10 months ago

I am getting the same error too.

apcamargo commented 10 months ago

This was introduced in a recent update that replaced prodigal-gv with pyrodigal-gv. I'll investigate it further. Meanwhile you can downgrade to version 1.7.0 and it will work just fine.

@pck00 and @mperisin-lallemand. Can you send me the error message? I want to check if the problematic key was the same.

mperisin-lallemand commented 10 months ago

[14:49:27] geNomad find-proviruses finished! ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad marker-classification (v1.7.2). This will classify the input sequences into chromosome, plasmid, or │ │ virus based on the presence of geNomad markers and other gene-related features. │ │ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ test_out/CAN2567026_mseq340_001_modified_marker_classification │ │ ├── CAN2567026_mseq340_001_modified_marker_classification.json (execution parameters) │ │ ├── CAN2567026_mseq340_001_modified_features.tsv (sequence feature data: tabular format) │ │ ├── CAN2567026_mseq340_001_modified_features.npz (sequence feature data: binary format) │ │ ├── CAN2567026_mseq340_001_modified_marker_classification.tsv (sequence classification: tabular format) │ │ ├── CAN2567026_mseq340_001_modified_marker_classification.npz (sequence classification: binary format) │ │ ├── CAN2567026_mseq340_001_modified_provirus_features.tsv (provirus feature data: tabular format) │ │ ├── CAN2567026_mseq340_001_modified_provirus_features.npz (provirus feature data: binary format) │ │ ├── CAN2567026_mseq340_001_modified_provirus_marker_classification.tsv (provirus classification: tabular format) │ │ └── CAN2567026_mseq340_001_modified_provirus_marker_classification.npz (provirus classification: binary format) │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [14:49:27] Executing genomad marker-classification. [14:49:27] Creating the test_out/CAN2567026_mseq340_001_modified_marker_classification directory. Traceback (most recent call last): File "/RAID6_Data/software/anaconda3/envs/mvp/bin/genomad", line 10, in sys.exit(cli()) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, *kwargs) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/genomad/cli.py", line 1266, in end_to_end ctx.invoke( File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/genomad/cli.py", line 656, in marker_classification genomad.marker_classification.main( File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 503, in main ) = get_feature_array( File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 247, in get_feature_array for annotated_contig in yield_annotated_contigs( File "/RAID6_Data/software/anaconda3/envs/mvp/lib/python3.10/site-packages/genomad/modules/marker_classification.py", line 186, in yield_annotated_contigs annotated_contigs_dict[contig].gene_rbs.append(rbs_categories_dict[rbs]) KeyError: 'GATAAT'

apcamargo commented 10 months ago

My guess is that this was caused by this commit in pyrodigal-gv. What do you think, @althonos?

apcamargo commented 10 months ago

@YANG-Jiwei, @pck00, and @mperisin-lallemand. Until I fix this issue properly, you can either downgrade pyrodigal-gv to 0.2.0 or geNomad to 1.7.0.

pck00 commented 10 months ago

This was introduced in a recent update that replaced prodigal-gv with pyrodigal-gv. I'll investigate it further. Meanwhile you can downgrade to version 1.7.0 and it will work just fine.

@pck00 and @mperisin-lallemand. Can you send me the error message? I want to check if the problematic key was the same.

KeyError: 'ATAACA' for me

mperisin-lallemand commented 10 months ago

Thanks! I downgraded to geNomad 1.7.0 and it worked.

apcamargo commented 10 months ago

pyrodigal-gv reverted to the previous behaviour and geNomad should work fine with pyrodigal-gv version 0.3.1 (thanks @althonos). I'm pushing a new geNomad release that will require this version.