broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
297 stars 54 forks source link

gpu enabled cellbender container #280

Open ryao-mdanderson opened 1 year ago

ryao-mdanderson commented 1 year ago

Dear CellBender support team:

I followed "Using The Official Docker Image" section to pull a gpu enabled container. and try to use it on HPC cluster.

On a gpu node with cuda toolkit 11.5 (and tested in cuda 11.2 as well) module loaded, the command: $ singularity run /risapps/singularity/repo/cellbender/0.3.0/cellbender.sif cellbender remove-background --cuda --input CellBender/raw_feature_bc_matrix.h5 --output test.h5

hit an error message: (note: I got a test input file from the user) Traceback (most recent call last): File "/opt/conda/bin/cellbender", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.7/site-packages/cellbender/base_cli.py", line 120, in main args = cli_dict[args.tool].validate_args(args) File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 80, in validate_args assert torch.cuda.is_available(), "Trying to use CUDA, " \ AssertionError: Trying to use CUDA, but CUDA is not available.

I verified inside the container, torch.cuda is return false $ singularity run /risapps/singularity/repo/cellbender/0.3.0/cellbender.sif /bin/bash Singularity> which python /opt/conda/bin/python

import torch; print(torch.cuda.is_available()) False

I wonder if the pytorch on 0.3.0 version of container is gpu enabled? Any suggestion to fix this for using container version?

Thank you for your help, Rong Yao

sjfleming commented 1 year ago

Hi @ryao-mdanderson , I am not sure exactly what is causing your issue.

But let me point you to these two issues just in case they might be of any assistance.

127

162

specifically this comment https://github.com/broadinstitute/CellBender/issues/162#issuecomment-1411928306

Does that give you any hint about a way forward?

I am not very familiar with singularity myself. But one piece of evidence I do have is that the official docker image at

us.gcr.io/broad-dsde-methods/cellbender:0.3.0

is successfully being used by workflows which run (via WDL workflows in Terra) on Google Compute Engine VMs with GPUs.

I also have a WDL which can be used to test and make sure that CUDA is available in the running docker container (which I test on the google cloud platform) here https://github.com/broadinstitute/CellBender/blob/master/cellbender/remove_background/tests/benchmarking/docker_image_check_cuda_status.wdl and this test does pass successfully with the current docker image.

ryao-mdanderson commented 1 year ago

Hi Stephen @sjfleming Thank you very much for your quick response. I will review your suggestion and try on our HPC cluster.

ryao-mdanderson commented 1 year ago

@sjfleming Thank you very much for your helpful tips, as soon as I saw post #127, I realized it is my bad I did not pass the option --nv to run this singularity container.

It is running now (not finished yet) . I think this resolves my question.

sjfleming commented 1 year ago

Great news! I'm so glad @edg1983 posted that issue