apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Error with geNomad v1.8.0, missing tensorflow.keras #101

Closed flefler closed 1 month ago

flefler commented 1 month ago

Hello,

I am running geNomad v1.8.0. I followed the mamba install instructions mamba create -n genomad -c conda-forge -c bioconda genomad and downloaded the databases. I encounter the follow error when running the end-to-end workflow. I have tried removing and reinstalling geNomad. I am unable to determine the source of the issue or how to fix it. There were no issues when installing the environment.



╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  Executing geNomad annotate (v1.8.0). This will perform gene calling in the input sequences and annotate the predicted proteins with geNomad's markers.                                                        │
│  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│  Outputs:                                                                                                                                                                                                      │
│    12_geNomad/M67/final.contigs_annotate                                                                                                                                                                       │
│    ├── final.contigs_annotate.json (execution parameters)                                                                                                                                                      │
│    ├── final.contigs_genes.tsv (gene annotation data)                                                                                                                                                          │
│    ├── final.contigs_taxonomy.tsv (taxonomic assignment)                                                                                                                                                       │
│    ├── final.contigs_mmseqs2.tsv (MMseqs2 output file)                                                                                                                                                         │
│    └── final.contigs_proteins.faa (protein FASTA file)                                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[15:49:18] Executing genomad annotate.                                                                                                                                                                            
[15:49:19] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.                                         
[15:49:19] final.contigs_proteins.faa was found. Skipping gene prediction with pyrodigal-gv.                                                                                                                      
[15:49:19] final.contigs_mmseqs2.tsv was found. Skipping protein annotation with MMseqs2.                                                                                                                         
[15:49:24] Gene data was written to final.contigs_genes.tsv.                                                                                                                                                      
[15:49:24] Taxonomic assignment data was written to final.contigs_taxonomy.tsv.                                                                                                                                   
[15:49:24] geNomad annotate finished!                                                                                                                                                                             
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  Executing geNomad find-proviruses (v1.8.0). This will find putative proviral regions within the input sequences.                                                                                              │
│  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│  Outputs:                                                                                                                                                                                                      │
│    12_geNomad/M67/final.contigs_find_proviruses                                                                                                                                                                │
│    ├── final.contigs_find_proviruses.json (execution parameters)                                                                                                                                               │
│    ├── final.contigs_provirus.tsv (provirus data)                                                                                                                                                              │
│    ├── final.contigs_provirus.fna (provirus nucleotide sequences)                                                                                                                                              │
│    ├── final.contigs_provirus_proteins.faa (provirus protein sequences)                                                                                                                                        │
│    ├── final.contigs_provirus_genes.tsv (provirus gene annotation data)                                                                                                                                        │
│    ├── final.contigs_provirus_taxonomy.tsv (provirus taxonomic assignment)                                                                                                                                     │
│    ├── final.contigs_provirus_mmseqs2.tsv (MMseqs2 output file)                                                                                                                                                │
│    └── final.contigs_provirus_aragorn.tsv (Aragorn output file)                                                                                                                                                │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[15:49:24] Executing genomad find-proviruses.                                                                                                                                                                     
[15:49:24] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.                                         
[15:49:25] final.contigs_provirus_mmseqs2.tsv was found. Skipping integrase search.                                                                                                                               
[15:49:25] final.contigs_provirus_aragorn.tsv was found. Skipping tRNA identification.                                                                                                                            
[15:49:25] Provirus regions identified.                                                                                                                                                                           
[15:49:25] Provirus data was written to final.contigs_provirus.tsv.                                                                                                                                               
[15:49:25] Provirus nucleotide sequences were written to final.contigs_provirus.fna.                                                                                                                              
[15:49:26] Provirus protein sequences were written to final.contigs_provirus_proteins.faa.                                                                                                                        
[15:49:26] Provirus gene data was written to final.contigs_provirus_genes.tsv.                                                                                                                                    
[15:49:26] Taxonomic assignment data was written to final.contigs_provirus_taxonomy.tsv.                                                                                                                          
[15:49:26] geNomad find-proviruses finished!                                                                                                                                                                      
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  Executing geNomad marker-classification (v1.8.0). This will classify the input sequences into chromosome, plasmid, or virus based on the presence of geNomad markers and other gene-related features.         │
│  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │
│  Outputs:                                                                                                                                                                                                      │
│    12_geNomad/M67/final.contigs_marker_classification                                                                                                                                                          │
│    ├── final.contigs_marker_classification.json (execution parameters)                                                                                                                                         │
│    ├── final.contigs_features.tsv (sequence feature data: tabular format)                                                                                                                                      │
│    ├── final.contigs_features.npz (sequence feature data: binary format)                                                                                                                                       │
│    ├── final.contigs_marker_classification.tsv (sequence classification: tabular format)                                                                                                                       │
│    ├── final.contigs_marker_classification.npz (sequence classification: binary format)                                                                                                                        │
│    ├── final.contigs_provirus_features.tsv (provirus feature data: tabular format)                                                                                                                             │
│    ├── final.contigs_provirus_features.npz (provirus feature data: binary format)                                                                                                                              │
│    ├── final.contigs_provirus_marker_classification.tsv (provirus classification: tabular format)                                                                                                              │
│    └── final.contigs_provirus_marker_classification.npz (provirus classification: binary format)                                                                                                               │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[15:49:26] Executing genomad marker-classification.                                                                                                                                                               
[15:49:26] Previous execution detected. Steps will be skipped unless their outputs are not found. Use the --restart option to force the execution of all the steps again.                                         
[15:49:26] final.contigs_features.npz was found. Skipping feature computation.                                                                                                                                    
[15:49:26] Sequence features in tabular format written to final.contigs_features.tsv.                                                                                                                             
[15:49:26] final.contigs_provirus_features.npz was found. Skipping provirus feature computation.                                                                                                                  
[15:49:26] Provirus features in tabular format written to final.contigs_provirus_features.tsv.                                                                                                                    
[15:49:26] final.contigs_marker_classification.npz was found. Skipping sequence classification.                                                                                                                   
[15:49:26] Sequence classification in tabular format written to final.contigs_marker_classification.tsv.                                                                                                          
[15:49:26] final.contigs_provirus_marker_classification.npz was found. Skipping provirus classification.                                                                                                          
[15:49:26] Provirus classification in tabular format written to final.contigs_provirus_marker_classification.tsv.                                                                                                 
[15:49:26] geNomad marker-classification finished!                                                                                                                                                                
Traceback (most recent call last):
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/bin/genomad", line 10, in <module>
    sys.exit(cli())
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1305, in end_to_end
    ctx.invoke(
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 725, in nn_classification
    genomad.nn_classification.main(
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/genomad/modules/nn_classification.py", line 37, in main
    from genomad import neural_network
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/genomad/neural_network/__init__.py", line 1, in <module>
    from .model import create_classifier
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/genomad/lib/python3.10/site-packages/genomad/neural_network/model.py", line 2, in <module>
    from tensorflow.keras import Model
ModuleNotFoundError: No module named 'tensorflow.keras'
apcamargo commented 1 month ago

Hi @flefler

This is very strange. Keras and Tensorflow are dependencies that should have been installed. Can you run these commands within the environment and send me the outputs?

python -c "import tensorflow as tf; print(tf.__version__)"

I think the problem here is that newer TensorFlow versions probably changed the way Keras is imported. If that's the case, you'll be able to fix the issue by installing an older version of TensorFlow (and I'll update geNomad accordingly).

flefler commented 1 month ago

Below is the output of the command

(genomad) [flefler@login12]$ python -c "import tensorflow as tf; print(tf.__version__)"
2024-05-25 01:23:06.695325: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2.16.1

Here are the versions of keras, tensonflow, and tensorboard in the conda environment. Not sure if this is useful, can provide full output of conda list if helpful


# Name                    Version                   Build  Channel
keras                     3.3.3              pyhd8ed1ab_0    conda-forge
tensorboard               2.16.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0           py310h75e40e8_1    conda-forge
tensorflow                2.16.1          cpu_py310h49b650b_0    conda-forge
tensorflow-base           2.16.1          cpu_py310h224022f_0    conda-forge
tensorflow-estimator      2.16.1          cpu_py310hc6dcfef_0    conda-forge
apcamargo commented 1 month ago

Ok, so the problem is that geNomad is not compatible with keras >= 3 and tensorflow >= 2.16. Can you downgrade to keras < 3 and tensorflow < 2.15? I'll fix this in Bioconda so that future installs won't have this issue.

apcamargo commented 1 month ago

This was fixed in the Bioconda recipe. New installs won't have this issue anymore.

flefler commented 1 month ago

Great. Thank you for your help!