Psy-Fer / deeplexicon

Signal based nanopore RNA demultiplexing with convolutional neural networks
https://psy-fer.github.io/deeplexicon/
MIT License
36 stars 8 forks source link

Significant increase in unknown/unassigned reads - change in GPU architecture??? #28

Closed DepledgeLab closed 10 months ago

DepledgeLab commented 10 months ago

Hello,

The HPC that I use for deeplexicon recently received a hardware upgrade that broke my installation. I have since rebuilt it and am now performing the demultiplexing step on an NVIDIA A100 80GB PCIe (previously Tesla V100-SXM2-16GB). However, I am getting a huge number of unknown/unassigned reads, even when repeating previous demultiplexing runs...

For instance,

Original Tesla V100 run 1 Barcode 225803 bc_1 124918 bc_2 172721 bc_3 91739 bc_4 19497 unknown

New NVIDIA A100 run 1 Barcode 86091 bc_1 29855 bc_2 6 bc_3 49 bc_4 518677 unknown

I guess I don't understand enough about GPU processes to know if the GPU itself is causing a problem or something else. I would really appreciate any help you can give on this front.

The (rebuilt) virtual environment I am using has the following packages installed (pip freeze).

absl-py==0.7.1 astor==0.8.0 cycler==0.10.0 gast==0.2.2 google-pasta==0.1.7 grpcio==1.22.0 h5py==2.10.0 joblib==0.13.2 Keras==2.2.4 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 llvmlite==0.36.0 Markdown==3.1.1 matplotlib==3.1.1 mock==5.1.0 numba==0.53.0 numpy==1.17.0 ont-fast5-api==3.3.0 packaging==23.2 pandas==0.25.0 progressbar33==2.4 protobuf==3.9.1 pyparsing==2.4.2 python-dateutil==2.8.0 pyts==0.8.0 pytz==2019.2 PyYAML==5.1.2 scikit-learn==0.21.3 scipy==1.3.1 six==1.12.0 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-estimator==1.13.0 tensorflow-gpu==1.13.1 termcolor==1.1.0 Werkzeug==0.15.5 wrapt==1.11.2

DepledgeLab commented 10 months ago

A small update. I managed to find a node on the HPC that was still running the older Tesla V100 and the results now look 'normal' again in terms of most reads being assigned a specific barcode.

However, I am still concerned about why this doesn't happen with the A100 cards and would appreciate your thoughts on this. Is there any reason to suspect that the 'normal' results from the V100 might also be compromised?

DepledgeLab commented 10 months ago

I have belatedly discovered this issue has been reported in other forms here and [here]https://github.com/biocorecrg/BioNextflow/issues/17).

A warning on the GitHub page (and other places) about not using newer Nvidia GPU architectures is definitely needed.

enovoa commented 10 months ago

Hi @DepledgeLab, yes indeed we ourselves had the same issue when we changed to testing new GPU architectures, and we opened the issue you refer above to ourselves. I fully agree that we should put a warning in the README, I will fix this now.

PS. To solve the GPU issue and also to improve the speed of demultiplexing, we've been working on an alternative approach to demultiplex direct RNA runs. We'll be releasing the code very soon, it is embedded as part of the new version of MasterOfPores that we are working on, version 3, which we are about to release once we fix a couple of final things, should be ready very soon, I will comment here once it is public. Thanks!

enovoa commented 10 months ago

Updated README with WARNING about CUDA version required added.