Closed No-drama-llama closed 4 months ago
Hi!
From what you wrote I assume that you don't get any error message and learnMSA runs correctly other than the fact that it doesn't recognize the A100.
Could you try to use a single GPU instead (-d 0
)? There might be a problem with multi-GPU support that I have to fix first.
If a single GPU is still not detected, can you verify that tensorflow and CUDA/cuDNN version are compatible? (They have to be, unfortunately, and it can be a hassle). There is a table here that will help.
Since in your case CUDA Version: 12.0
seems to be installed, upgrading tensorflow to the newest version could help.
Hi,
Yes, learnMSA runs correctly otherwise. I tried with one GPU but still the same error. I think the problem might be the version of tensorflow and CUDA on our system, I am trying to fix it now. Hopefully it doesn't give me too much trouble.
We have by chance also A100 GPUs with CUDA 12.0 installed on our cluster. I can confirm that with tensorflow==2.13.0
GPUs are not recognized, but everything works well under tensorflow==2.10.0
.
We should be able to solve your problem this way:
conda install mamba
mamba create -n learnMSA_env tensorflow==2.10.0 learnMSA
mamba activate learnMSA_env
You can also skip the first step and replace mamba with conda, but usually solving the environment is much slower then...
Hi again,
I tried what you suggested but it does not work for me, I get this error: `/pasteur/appa/homes/abutkovi/anaconda3/envs/learmsa2/lib/python3.11/site-packages/conda_package_streaming/package_streaming.py:19: UserWarning: zstandard could not be imported. Running without .conda support. warnings.warn("zstandard could not be imported. Running without .conda support.") /pasteur/appa/homes/abutkovi/anaconda3/envs/learmsa2/lib/python3.11/site-packages/conda_package_handling/api.py:29: UserWarning: Install zstandard Python bindings for .conda support _warnings.warn("Install zstandard Python bindings for .conda support")
Looking for: ['tensorflow==2.10.0', 'learnmsa', 'famsa']
pkgs/main/noarch 851.6kB @ 4.4MB/s 0.2s pkgs/r/linux-64 1.4MB @ 4.7MB/s 0.3s pkgs/r/noarch 1.3MB @ 2.8MB/s 0.3s pkgs/main/linux-64 5.9MB @ 5.5MB/s 1.1s conda-forge/noarch 13.4MB @ 5.2MB/s 2.6s conda-forge/linux-64 33.2MB @ 5.5MB/s 6.3s Could not solve for environment specs Encountered problems while solving:
The environment can't be solved, aborting the operation`
I also tried pip install tensorflow==2.10.0 logomaker networkx seaborn
but it doesn't find tensorflow 2.10. ERROR: Could not find a version that satisfies the requirement tensorflow==2.10.0 (from versions: 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0) ERROR: No matching distribution found for tensorflow==2.10.0
. Do you maybe have an idea why mamba doesn't find learnmsa?
I try installing the zstandard library but it conda says it is already installed.
Have you set up Bioconda channels (see here)? You only have to do this once.
Yes, there was a different problem with conda but I managed to fix it. Let's see if the program uses the GPUs now. Thanks for your help!
Hi,
Unfortunately, I couldn't achieve learnMSA to run with GPU option but it seems to be an issue of our cluster. Since everybody is on vacation I will see if I manage to fix it when people come back but so far I am running it on CPU.
Thanks for your help and have a lovely August, Anamarija
Thanks for reporting back!
Let me know how it turns out and if I can further assist with anything.
Have a great summer as well!
I will close this issue. Feel free to reopen it if necessary.
Hi,
First of all, thank you for this wonderful tool. I love it, it works great for my sequences and it is fast and aligns them correctly.
However, I keep having a problem since the program doesn't recognize available GPUs on our cluster. I have a large number of sequences (around 11,000) and using GPU would really cut my computation time. I am attaching the output from the nvidia-smi command: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:01:00.0 Off | 0 | | N/A 27C P0 50W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-SXM... On | 00000000:C1:00.0 Off | 0 | | N/A 25C P0 50W / 400W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
As you can see there are GPUs available, we have cuda and cudnn installed (module load) and I run this in the dedicated gpu partition on slurm (srun --partition=gpu --qos=gpu --gres=gpu:A100:2 --mem=30G learnMSA -i input.fasta -d 0,1 -o output.fasta). I am a bit lost as to what the problem might be or what I am doing wrong.
Can you help me with this, please?
Thanks again for such a wonderful tool and have a great day! Anamarija