apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Error at NN step when trying to process a single sample #78

Closed simroux closed 3 months ago

simroux commented 4 months ago

When the input file has only one sequence, the NN step can fail with the following error:

Blas xGEMV launch failed : a.shape=[1,2100,512], b.shape=[1,512,1], m=2100, n=1, k=512
         [[{{node model_1/model/igloo1d_kernel/MatMul}}]] [Op:__inference_predict_function_983]

This can be fixed by setting the variable CUDA_VISIBLE_DEVICES (tested with setting it to 1 and -1). This variable is not set by default.

apcamargo commented 4 months ago

Fixed in https://github.com/apcamargo/genomad/commit/8463c914e43af689930dd5aaf34e4f345c0c1990. I'll close the issue once a new released is published.

leannmlindsey commented 4 months ago

I am also having this error

apcamargo commented 4 months ago

@leannmlindsey I'll release the new update today. Setting the environment variable fixed it for you?

leannmlindsey commented 4 months ago

I'm actually not sure it is the same error. It is also on the NN step. This is the error I am getting:

[15:01:44] Executing genomad nn-classification.
[15:01:44] Previous execution detected. Steps will be skipped unless their
outputs are not found. Use the --restart option to force the
execution of all the steps again.
[15:01:44] 11436X42_ds_encoded_sequences was found. Skipping sequence encoding.

Traceback (most recent call last): File "/jet/home/lindseyl/.local/bin/genomad", line 8, in sys.exit(cli()) ^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/rich_click/rich_command.py", line 126, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/genomad/cli.py", line 1276, in end_to_end ctx.invoke( File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/genomad/cli.py", line 719, in nn_classification genomad.nn_classification.main( File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/genomad/modules/nn_classification.py", line 308, in main nn_model = neural_network.create_classifier() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/genomad/neural_network/model.py", line 34, in create_classifier encoder = create_encoder() ^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/genomad/neural_network/model.py", line 14, in create_encoder inputs = Input(shape=5_997, dtype="int64") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/keras/src/layers/core/input_layer.py", line 143, in Input layer = InputLayer( ^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/keras/src/layers/core/input_layer.py", line 46, in init shape = backend.standardize_shape(shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jet/home/lindseyl/.local/share/pipx/venvs/genomad/lib/python3.12/site-packages/keras/src/backend/common/variables.py", line 406, in standardize_shape raise ValueError(f"Cannot convert '{shape}' to a shape.") ValueError: Cannot convert '5997' to a shape.

apcamargo commented 4 months ago

This is new. Can you set CUDA_VISIBLE_DEVICES to -1 and run it genomad nn-classification with the --restart flag?

What is your environment? Maybe this has to do with the TensorFlow/Keras versions.

leannmlindsey commented 4 months ago

Unfortunately using CUDA_VISIBLE_DEVICES=-1 still resulted in the same error (though it took much longer to run)

I was unable to install via conda so I installed via the pipx instructions

I can't seem to see any tensorflow installation. I have tried:

conda list --> no keras or tensorflow listed

(genomad) [lindseyl@v007 CHPC]$ which tensorflow /usr/bin/which: no tensorflow in (/jet/home/lindseyl/.conda/envs/genomad/bin:/opt/packages/anaconda3-2022.10/condabin:/opt/packages/anaconda3-2022.10/bin:/jet/home/lindseyl/edirect:/jet/home/lindseyl/.local/bin:/jet/home/lindseyl/bin:/opt/packages/psc.allocations.user/bin:/opt/packages/allocations/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/packages/interact/bin:/opt/puppetlabs/bin)

(genomad) [lindseyl@v007 CHPC]$ pip show tensorflow WARNING: Package(s) not found: tensorflow (genomad) [lindseyl@v007 CHPC]$ python Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'tensorflow' exit()

I then tried to install tensorflow via conda into the genomad conda environment but it was unsuccessful

leannmlindsey commented 4 months ago

I will try a complete reinstall and see if I can narrow in where the issue starts.

leannmlindsey commented 4 months ago

I uninstalled the pipx env and then reinstalled and I had the same error. I think there is probably a problem with the pipx env, at least on our machines.

I will try later today to install the keras tensorflow separately

apcamargo commented 4 months ago

I can't seem to see any tensorflow installation. I have tried:

conda list --> no keras or tensorflow listed

(genomad) [lindseyl@v007 CHPC]$ which tensorflow /usr/bin/which: no tensorflow in (/jet/home/lindseyl/.conda/envs/genomad/bin:/opt/packages/anaconda3-2022.10/condabin:/opt/packages/anaconda3-2022.10/bin:/jet/home/lindseyl/edirect:/jet/home/lindseyl/.local/bin:/jet/home/lindseyl/bin:/opt/packages/psc.allocations.user/bin:/opt/packages/allocations/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/packages/interact/bin:/opt/puppetlabs/bin)

You tried this in your pipx installation? If so, it makes sense you can't find TensorFlow or that the CUDA_VISIBLE_DEVICES fix didn't work. pipx creates a separate environment, so you won't be able to access it from your "standard" environment and the environment variables you set there won't affect the pipx env. I'll release a new version today that will set CUDA_VISIBLE_DEVICES within the execution, so this shouldn't be a problem anymore.

What is bothering me is that you couldn't install geNomad via Conda. What error did you get?

leannmlindsey commented 3 months ago

I tried again with conda and the conda install finished with no errors this time (I may not have waited long enough last time, I think I thought it had timed out)

but when I run it, it does not get as far as the pipx install.

This is the error:

TIME: Start: = 2024-03-04 11:24:07

leannmlindsey commented 3 months ago

The difference in time to process each file is significantly longer with the CUDA_VISIBLE_DEVICES fix, so I would like to try to get the tensorflow/keras working so that I can process about 350 files.

I will try to install those separately and see if that works

leannmlindsey commented 3 months ago

I have access to another CHPC system (Bridges2 at Pittsburg Supercomputing Center) and so I tried the conda install there and it was successful and the test ran to completion with no errors.

I think the problems with the conda install may be isolated to the University of Utah system.

erinyoung commented 3 months ago

I have something to add to this. I also ran into the following error:

402.1 Traceback (most recent call last):
402.1   File "/usr/local/bin/genomad", line 8, in <module>
402.1     sys.exit(cli())
402.1   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
402.1     return self.main(*args, **kwargs)
402.1   File "/usr/local/lib/python3.10/dist-packages/rich_click/rich_command.py", line 126, in main
402.1     rv = self.invoke(ctx)
402.1   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
402.1     return _process_result(sub_ctx.command.invoke(sub_ctx))
402.1   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
402.1     return ctx.invoke(self.callback, **ctx.params)
402.1   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
402.1     return __callback(*args, **kwargs)
402.1   File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
402.1     return f(get_current_context(), *args, **kwargs)
402.1   File "/usr/local/lib/python3.10/dist-packages/genomad/cli.py", line 1276, in end_to_end
402.1     ctx.invoke(
402.1   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
402.1     return __callback(*args, **kwargs)
402.1   File "/usr/local/lib/python3.10/dist-packages/genomad/cli.py", line 719, in nn_classification
402.1     genomad.nn_classification.main(
402.1   File "/usr/local/lib/python3.10/dist-packages/genomad/modules/nn_classification.py", line 309, in main
402.1     nn_model = neural_network.create_classifier()
402.1   File "/usr/local/lib/python3.10/dist-packages/genomad/neural_network/model.py", line 34, in create_classifier
402.1     encoder = create_encoder()
402.1   File "/usr/local/lib/python3.10/dist-packages/genomad/neural_network/model.py", line 14, in create_encoder
402.1     inputs = Input(shape=5_997, dtype="int64")
402.1   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/input_layer.py", line 143, in Input
402.1     layer = InputLayer(
402.1   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/input_layer.py", line 46, in __init__
402.1     shape = backend.standardize_shape(shape)
402.1   File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/common/variables.py", line 406, in standardize_shape
402.1     raise ValueError(f"Cannot convert '{shape}' to a shape.")
402.1 ValueError: Cannot convert '5997' to a shape.

After installing genomad via pip. (Aragorn and mmseqs2 were installed from source)

I was able to circumvent this problem if I also specified the keras and tensorflow version to match that in the bioconda recipe (both were 2.15.0 for genomad 1.7.5) with something like:

pip install genomad==1.7.5 tensorflow==1.15.0 keras==1.15.0
apcamargo commented 3 months ago

@erinyoung that's good to know. Do you know which versions were installed when you used pip?

erinyoung commented 3 months ago

tensorboard was 2.16.2 and keras was 3.0.5

Here's the entire list:

$ pip list
Package                   Version
------------------------- -----------
absl-py                   2.1.0
archspec                  0.2.1
astunparse                1.6.3
attmap                    0.13.2
attrs                     23.2.0
biopython                 1.83
bleach                    6.1.0
boltons                   23.0.0
Brotli                    1.0.9
cattrs                    23.2.3
certifi                   2024.2.2
cffi                      1.16.0
cfgv                      3.3.1
charset-normalizer        2.0.4
Cheetah3                  3.2.6.post2
click                     8.1.7
colorama                  0.4.6
conda                     23.11.0
conda-content-trust       0.2.0
conda-libmamba-solver     23.12.0
conda-package-handling    2.2.0
conda_package_streaming   0.9.0
cryptography              41.0.7
distlib                   0.3.8
distro                    1.8.0
dm-tree                   0.1.8
docutils                  0.20.1
eido                      0.2.2
exceptiongroup            1.2.0
filelock                  3.13.1
filetype                  1.2.0
flatbuffers               24.3.7
future                    1.0.0
galaxy-tool-util          23.2.1
galaxy-util               23.2.1
gast                      0.5.4
genomad                   1.7.5
gitdb                     4.0.11
GitPython                 3.1.42
google-pasta              0.2.0
graphviz                  0.20.1
greenlet                  3.0.3
grpcio                    1.62.1
h5py                      3.10.0
identify                  2.5.35
idna                      3.4
importlib-metadata        7.0.1
importlib_resources       6.1.2
iniconfig                 2.0.0
itsdangerous              2.1.2
Jinja2                    3.1.3
jsonpatch                 1.32
jsonpointer               2.1
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
keras                     3.0.5
libclang                  16.0.6
libmambapy                1.5.6
linkify-it-py             2.0.3
llvmlite                  0.42.0
logmuse                   0.2.6
lxml                      5.1.0
mamba                     1.5.6
Markdown                  3.5.2
markdown-it-py            3.0.0
MarkupSafe                2.1.5
mdit-py-plugins           0.4.0
mdurl                     0.1.2
menuinst                  2.0.2
ml-dtypes                 0.3.2
namex                     0.0.7
nf-core                   2.13.1
nodeenv                   1.8.0
numba                     0.59.0
numpy                     1.26.4
opt-einsum                3.3.0
oyaml                     1.0
packaging                 23.1
pandas                    2.2.1
pdiff                     1.1.4
peppy                     0.40.1
pillow                    10.2.0
pip                       23.3.1
piper                     0.14.0
pipestat                  0.8.2
pkgutil_resolve_name      1.3.10
platformdirs              3.10.0
pluggy                    1.0.0
pre_commit                3.6.2
prompt-toolkit            3.0.36
protobuf                  4.25.3
psutil                    5.9.8
psycopg2                  2.9.9
pycosat                   0.6.6
pycparser                 2.21
pydantic                  1.10.13
pyfaidx                   0.8.1.1
Pygments                  2.17.2
pyparsing                 3.1.1
pyrodigal                 3.3.0
pyrodigal-gv              0.3.1
PySocks                   1.7.1
pytest                    7.4.4
pytest-workflow           2.0.1
python-crfsuite           0.9.10
python-dateutil           2.8.2
pytz                      2024.1
PyVCF3                    1.0.3
PyYAML                    6.0.1
questionary               2.0.1
referencing               0.33.0
refgenconf                0.12.2
refgenie                  0.12.1
repoze.lru                0.7
requests                  2.31.0
requests-cache            1.2.0
rich                      13.7.1
rich-click                1.7.3
Routes                    2.5.1
rpds-py                   0.18.0
ruamel.yaml               0.17.21
ruff                      0.2.2
scipy                     1.12.0
setuptools                68.2.2
six                       1.16.0
smmap                     5.0.0
sortedcontainers          2.4.0
SQLAlchemy                2.0.27
sqlmodel                  0.0.14
tabulate                  0.9.0
taxopy                    0.12.0
tensorboard               2.16.2
tensorboard-data-server   0.7.2
tensorflow                2.16.1
termcolor                 2.4.0
textual                   0.52.1
tomli                     2.0.1
tqdm                      4.65.0
trogon                    0.5.0
truststore                0.8.0
typing_extensions         4.10.0
tzdata                    2024.1
ubiquerg                  0.7.0
uc-micro-py               1.0.3
ujson                     5.9.0
ukkonen                   1.0.1
url-normalize             1.4.3
urllib3                   2.1.0
veracitools               0.1.3
virtualenv                20.25.1
wcwidth                   0.2.13
webencodings              0.5.1
Werkzeug                  3.0.1
wheel                     0.41.2
Whoosh                    2.7.4
wrapt                     1.16.0
xgboost                   2.0.3
yacman                    0.9.3
zipp                      3.17.0
zipstream-new             1.1.8
zstandard                 0.19.0
apcamargo commented 3 months ago

Thanks @erinyoung and @leannmlindsey. I fixed this in version 1.7.6.

The problem is that Keras 3.0 created some incompatibility issues. In the begging, only people that installed geNomad through pip would notice this, but now that Keras 3 is in conda-forge the problem became more common. geNomad 1.7.6 just sets the maximum Keras version to below 3.0.