apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
189 stars 17 forks source link

question about nn-classification #35

Open lilinzhou opened 1 year ago

lilinzhou commented 1 year ago

Hello, Thanks for the impressive tools for provirus identification. I have two questions when running the tools:

  1. Can this tool be applied to Eukaryota organisms? Like some Axenic/Authentic algae or fungi? and any parameters need to be adjusted (like "The Genetic Codes" or any else)?
  2. When I ran the nn-classification separately using the test dataset (Note: I ran this tool on a server without a GPU), I got some errors as follows, is there any suggestion to deal with it?

Commond: genomad nn-classification --cleanup --threads 2 GCF_009025895.1.fa output

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Executing geNomad nn-classification (v1.7.0). This will classify the input sequences into chromosome, plasmid, or virus based on the nucleotide sequence. │ │ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ Outputs: │ │ out_1/GCF_009025895.1_nn_classification │ │ ├── GCF_009025895.1_nn_classification.json (execution parameters) │ │ ├── GCF_009025895.1_encoded_sequences (directory containing encoded sequence data) │ │ ├── GCF_009025895.1_nn_classification.tsv (contig classification: tabular format) │ │ ├── GCF_009025895.1_nn_classification.npz (contig classification: binary format) │ │ ├── GCF_009025895.1_encoded_proviruses (directory containing encoded sequence data) │ │ ├── GCF_009025895.1_provirus_nn_classification.tsv (provirus classification: tabular format) │ │ └── GCF_009025895.1_provirus_nn_classification.npz (provirus classification: binary format) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [16:24:41] Executing genomad nn-classification. [16:24:42] Creating the out_1/GCF_009025895.1_nn_classification/GCF_009025895.1_encoded_sequences directory. [16:24:45] Encoded sequence data written to GCF_009025895.1_encoded_sequences. [16:24:45] Creating the out_1/GCF_009025895.1_nn_classification/GCF_009025895.1_encoded_proviruses directory. [16:24:46] Encoded provirus data written to GCF_009025895.1_encoded_proviruses. Traceback (most recent call last): File "/path/to/python3.9.6/bin/genomad", line 8, in sys.exit(cli()) File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1137, in call return self.main(*args, kwargs) File "/path/to/python3.9.6/lib/python3.9/site-packages/rich_click/rich_group.py", line 21, in main rv = super().main(args, standalone_mode=False, kwargs) File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/path/to/python3.9.6/lib/python3.9/site-packages/click/core.py", line 763, in invoke return __callback(args, kwargs) File "/path/to/python3.9.6/lib/python3.9/site-packages/genomad/cli.py", line 719, in nn_classification genomad.nn_classification.main( File "/path/to/python3.9.6/lib/python3.9/site-packages/genomad/modules/nn_classification.py", line 304, in main TimeRemainingColumn(elapsed_when_finished=True), TypeError: init() got an unexpected keyword argument 'elapsed_when_finished'

Software version I used:

The python version: 3.9.6 TensorFlow version: I tried 2.8.0, 2.10.0 and 2.13.0 GeNomad database version: v1.5

apcamargo commented 1 year ago

This seems to be a problem with the rich library. Can you update it and try again?

lilinzhou commented 1 year ago

It works well after updating the rich library, thank you. And any suggestions about the prediction of eukaryotic organisms?

apcamargo commented 1 year ago

Ohh, sorry that I forgot about that.

geNomad works in identifying eukaryotic viruses, but you should be more careful with false positives. Try selecting viruses with at least one hallmark gene (--min-virus-hallmarks 1).