GPU implementation - Githubissues

ccruizm commented 1 year ago

Good day,

I would like to know whether Spectra can use GPU. n the example notebook could not find much information about it, but I saw there seems to be an implementation to use GPU (https://github.com/dpeerlab/spectra/blob/main/spectra/spectra_gpu.py).

Does this automatically detect I am using a GPU node? Are there differences in the outcome between running using CPU vs GPU?

Thanks in advance!

kvshams commented 1 year ago

@ccruizm seems you can invoke the GPU utility by

from spectra import spectra_gpu as spc_gpu

and all the commands are available with spc_gpu.'...'. It seems still experimental and gives info while calling

Spectra GPU support is still under development. Raise any issues on github 

 Changes from v1: 
 (1) GPU support [see tutorial] 
 (2) minibatching for local parameters and data 
 Note that minibatching may affect optimization results 
 Code will eventually be merged into spectra.py

I have tried it on a V100 GPU and it takes forever even for 2 epochs on my data. I am not sure are there any further preprocessing required!. I think I should try the spectra test data and see what happens.

russellkune commented 1 year ago

hi all,

I’m in the process of adding a new GPU file.

On Wed, Jun 14, 2023 at 2:40 PM Shams @.***> wrote:

@ccruizm https://github.com/ccruizm seems you can invoke the GPU utility by

from spectra import spectra_gpu as spc_gpu

and all the commands are available with spc_gpu.'...'. It seems still experimental and gives info while calling

Spectra GPU support is still under development. Raise any issues on github

Changes from v1: (1) GPU support [see tutorial] (2) minibatching for local parameters and data Note that minibatching may affect optimization results Code will eventually be merged into spectra.py

I have used it on a V100 GPU and it takes forever even for 2 epochs on my data. I am not sure are there any further preprocessing required!. I think I should try the spectra test data and see what happens.

— Reply to this email directly, view it on GitHub https://github.com/dpeerlab/spectra/issues/21#issuecomment-1592025274, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNE3EGVNDTUYWZHPFQ4UKDXLIVUVANCNFSM6AAAAAAZEPBIL4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ccruizm commented 1 year ago

Thanks for your comments @kvshams and looking forward to testing the new GPU implementation @russellkune 🤓

ccruizm commented 1 year ago

I am running now in parallel comparison between CPU and GPU. My dataset is 120K cells, 460 gene sets and five main cell types. In the case of the CPU, the estimated time to complete 10000 epochs is less than 32h (~12s/it). Importing spectra_gpu in a cluster with an NVIDIA A100-SXM4-40GB show now an estimated time of 10h (3.7s/it) 😀 However, I am puzzled by the fact that when I check the use of resources of the GPU (nvidia-smi, torch.cuda.memory_allocated() or torch.cuda.memory_cached()) I do not use any allocation/running processes.

I assume it is using the GPU since it reduces the running time by 3x, but not sure why it does not display it 😅

kvshams commented 1 year ago

That’s a great improvement but not sure why it’s not showing in the resource usage. Is there any branch that I can try the new implementation?.

ccruizm commented 1 year ago

It is implemented in the master branch. I only changed the code from spectra import spectra_gpu as spc and kept everything the same.

kvshams commented 1 year ago

Thank you for very quick response.

Are there any specific commands or preprocesses required? I tried it yesterday. CPU version is running fine but when I tried on GPU version, it seems not working even for 2 epochs. Am I missing something? Thanks Shams

On Thu, Jun 15, 2023 at 6:59 AM Cristian @.***> wrote:

It is implemented in the master branch. I only changed the code from spectra import spectra_gpu as spc and kept everything the same.

— Reply to this email directly, view it on GitHub https://github.com/dpeerlab/spectra/issues/21#issuecomment-1592829394, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTFBLCAF2XJFNOBV5XQVJDXLLTJBANCNFSM6AAAAAAZEPBIL4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Email from machine with auto correction = Cell phone

ccruizm commented 1 year ago

Not at all. I am running exactly the same pipeline, only with the change I mentioned before. Maybe there are some issues with torch recognizing your GPU (have seen it with other tools/env set ups)? If you do:

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available")

    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print(f"Number of available GPUs: {num_gpus}")

    # Get the name of each available GPU
    for i in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(i)
        print(f"GPU {i}: {gpu_name}")
else:
    print("CUDA is not available")

Do you see you GPU?

kvshams commented 1 year ago

Yes, I have GPU detected in the session. Just modified the snippets above

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available")

    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print(f"Number of available GPUs: {num_gpus}")

    # Get the name and memory status of each available GPU
    for i in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(i)
        print(f"GPU {i}: {gpu_name}")

        # Get the memory information
        gpu_memory = torch.cuda.get_device_properties(i).total_memory
        gpu_memory_allocated = torch.cuda.memory_allocated(i)
        gpu_memory_cached = torch.cuda.memory_cached(i)
        gpu_memory_free = gpu_memory - gpu_memory_allocated - gpu_memory_cached

        print(f"\tTotal Memory: {gpu_memory / 1024**3:.2f} GB")
        print(f"\tAllocated Memory: {gpu_memory_allocated / 1024**3:.2f} GB")
        print(f"\tCached Memory: {gpu_memory_cached / 1024**3:.2f} GB")
        print(f"\tFree Memory: {gpu_memory_free / 1024**3:.2f} GB")
else:
    print("CUDA is not available")

that gives,

CUDA is available
Number of available GPUs: 1
GPU 0: Tesla V100-SXM2-32GB
    Total Memory: 31.75 GB
    Allocated Memory: 0.00 GB
    Cached Memory: 0.00 GB
    Free Memory: 31.75 GB

My session info is


Click to view session information
-----
numpy               1.23.5
pandas              2.0.0
scanpy              1.9.3
scipy               1.10.1
session_info        1.0.0
spectra             NA
torch               2.0.0.post200
-----
Click to view modules imported as dependencies
PIL                 9.5.0
anndata             0.9.1
asttokens           NA
awkward             2.2.0
awkward_cpp         NA
backcall            0.2.0
cairo               1.23.0
cffi                1.15.1
comm                0.1.3
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.7
decorator           5.1.1
defusedxml          0.7.1
dot_parser          NA
executing           1.2.0
gmpy2               2.1.2
google              NA
h5py                3.8.0
igraph              0.10.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.23.1
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
joblib              1.2.0
jsonpickle          3.0.1
kiwisolver          1.4.4
leidenalg           0.9.1
llvmlite            0.39.1
markupsafe          2.1.2
matplotlib          3.7.1
mpl_toolkits        NA
mpmath              1.3.0
natsort             8.3.1
networkx            3.1
numba               0.56.4
numexpr             2.8.4
nvfuser             NA
opt_einsum          v3.3.0
packaging           23.1
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.5.1
plotly              5.14.1
prompt_toolkit      3.0.38
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pydot               1.4.2
pygments            2.15.1
pyparsing           3.0.9
pytz                2023.3
pyvis               0.3.2
regex               2.5.129
setuptools          67.7.2
six                 1.16.0
sklearn             1.2.2
stack_data          0.6.2
sympy               1.12
texttable           1.6.7
threadpoolctl       3.1.0
tornado             6.3.2
tqdm                4.65.0
traitlets           5.9.0
typing_extensions   NA
wcwidth             0.2.6
yaml                6.0
zipp                NA
zmq                 25.0.2
zoneinfo            NA
-----
IPython             8.13.2
jupyter_client      8.2.0
jupyter_core        5.3.0
notebook            6.5.4
-----
Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) [GCC 11.3.0]
Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with-glibc2.17
-----
Session information updated at 2023-06-15 10:39

kvshams commented 1 year ago

@russellkune Seems there are two independent problems.

The delay I was observing due to the issue of running diffrent GPU jobs on the same GPU core. I was running it on the same GPU core (enough memory is available in the GPU core as seen above). Process stalls at CUDA Available: True
If I run a different core (no other process are running on this core) it runs but has died due to the memory allocation (seems to be its an insufficient RAM than to do with the GPU). I am waiting to get a larger memory node to see if the error still persists.

My data set is (114k cells, 14 cell types, 100 gene sets and kept use_highly_variable=False with about 5k genes).

CUDA Available:  True
Initializing model...
Building parameter set...

Then the Kernel dies.

kvshams commented 1 year ago

Finally it works (n#1 did work after about an hour stalling. Not sure what is causing it, but once it started the process, it was much faster, yay!). Are there any reason for not having label_factors and overlap_threshold parameters in the GPU est_spectra module?

Update: for the stalling

Seems it is associated with the memory allocation process. Got a GPU memory error while running multiple scripts in the same GPU core

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.77 GiB (GPU 0; 31.75 GiB total capacity; 27.05 GiB already allocated; 185.94 MiB free; 28.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@russellkune Seems there are two independent problems.

The delay I was observing due to the issue of running diffrent GPU jobs on the same GPU core. I was running it on the same GPU core (enough memory is available in the GPU core as seen above). Process stalls at CUDA Available: True

If I run a different core (no other process are running on this core) it runs but has died due to the memory allocation (seems to be its an insufficient RAM than to do with the GPU). I am waiting to get a larger memory node to see if the error still persists.

My data set is (114k cells, 14 cell types, 100 gene sets and kept use_highly_variable=False with about 5k genes).
CUDA Available:  True
Initializing model...
Building parameter set...
Then the Kernel dies.

ccruizm commented 1 year ago

Nice you made it work @kvshams! About your second comment, I also experienced something similar (running on CPU tho). I raised an issue (https://github.com/dpeerlab/spectra/issues/22) but closed it because I could not reproduce it in another HPC infrastructure. If you are having the same problem, it might be worthwhile to look at why this is happening.

kvshams commented 1 year ago

I thought that option is not there in the spectra_gpu.py functions. I couldn’t find those options there. Am I missing something? Shams

On Fri, Jun 16, 2023 at 1:07 AM Cristian @.***> wrote:

Nice you made it work @kvshams https://github.com/kvshams! About your second comment, I also experienced something similar (running on CPU). I raised an issue (#22 https://github.com/dpeerlab/spectra/issues/22) but closed it because I could not reproduce it in another HPC infrastructure. If you are having the same problem, it might be worthwhile to look at why this is happening.

— Reply to this email directly, view it on GitHub https://github.com/dpeerlab/spectra/issues/21#issuecomment-1594104784, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTFBLAB6CS7WFPGQG7SP4DXLPSYRANCNFSM6AAAAAAZEPBIL4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Email from machine with auto correction = Cell phone

kvshams commented 1 year ago

I guess if I call both gpu and cpu version in the same notebook (ie from spectra import spectra as spc &from spectra import spectra_gpu as spc_gpu), the error persist and if I am using only the gpu version it works.

For the speed it is ~3x faster in my case too (in 16 core CPU, 240g VM it is ~21hrs and on a Tesla V100-SXM2-32GB GPU vm with 16 core CPU its ~7hrs).

What is the timeline for adding the new GPU file? Are there any big difference that I need to be worried about using the results in the current version?

kvshams commented 1 year ago

I am also seeing same issue. Yesterday I saw my script was running in the GPU but did not finish due to insuffiecient memory. On fresh node today, it is running 3X faster but the PID is not listed in nvidia-smi output. Where is the process happening 😅. Seems to be a PID mapping issue

I am running now in parallel comparison between CPU and GPU. My dataset is 120K cells, 460 gene sets and five main cell types. In the case of the CPU, the estimated time to complete 10000 epochs is less than 32h (~12s/it). Importing spectra_gpu in a cluster with an NVIDIA A100-SXM4-40GB show now an estimated time of 10h (3.7s/it) 😀 However, I am puzzled by the fact that when I check the use of resources of the GPU (nvidia-smi, torch.cuda.memory_allocated() or torch.cuda.memory_cached()) I do not use any allocation/running processes.

I assume it is using the GPU since it reduces the running time by 3x, but not sure why it does not display it 😅

ccruizm commented 1 year ago

Well, I have some results to share 🤓

CPU implementation took 32.4h and reached an LR of 0.001, running all 10,000 epochs.
GPU reached a lower LR (0.0001) at 7250 epochs, and then no more epochs were run and stopped there. This took 7.5h.

I was waiting for the CPU job to finish and compare the results because I found some 'weird' behavior in the GPU run, but it also happened on the CPU implementation. I will raise a new issue to ask about it.

dpeerlab / spectra

GPU implementation #21

Update: for the stalling