Open ryao-mdanderson opened 1 year ago
Hi @ryao-mdanderson , I am not sure exactly what is causing your issue.
But let me point you to these two issues just in case they might be of any assistance.
specifically this comment https://github.com/broadinstitute/CellBender/issues/162#issuecomment-1411928306
Does that give you any hint about a way forward?
I am not very familiar with singularity myself. But one piece of evidence I do have is that the official docker image at
us.gcr.io/broad-dsde-methods/cellbender:0.3.0
is successfully being used by workflows which run (via WDL workflows in Terra) on Google Compute Engine VMs with GPUs.
I also have a WDL which can be used to test and make sure that CUDA is available in the running docker container (which I test on the google cloud platform) here https://github.com/broadinstitute/CellBender/blob/master/cellbender/remove_background/tests/benchmarking/docker_image_check_cuda_status.wdl and this test does pass successfully with the current docker image.
Hi Stephen @sjfleming Thank you very much for your quick response. I will review your suggestion and try on our HPC cluster.
@sjfleming Thank you very much for your helpful tips, as soon as I saw post #127, I realized it is my bad I did not pass the option --nv to run this singularity container.
It is running now (not finished yet) . I think this resolves my question.
Great news! I'm so glad @edg1983 posted that issue
Dear CellBender support team:
I followed "Using The Official Docker Image" section to pull a gpu enabled container. and try to use it on HPC cluster.
On a gpu node with cuda toolkit 11.5 (and tested in cuda 11.2 as well) module loaded, the command: $ singularity run /risapps/singularity/repo/cellbender/0.3.0/cellbender.sif cellbender remove-background --cuda --input CellBender/raw_feature_bc_matrix.h5 --output test.h5
hit an error message: (note: I got a test input file from the user) Traceback (most recent call last): File "/opt/conda/bin/cellbender", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/cellbender/base_cli.py", line 120, in main
args = cli_dict[args.tool].validate_args(args)
File "/opt/conda/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 80, in validate_args
assert torch.cuda.is_available(), "Trying to use CUDA, " \
AssertionError: Trying to use CUDA, but CUDA is not available.
I verified inside the container, torch.cuda is return false $ singularity run /risapps/singularity/repo/cellbender/0.3.0/cellbender.sif /bin/bash Singularity> which python /opt/conda/bin/python
I wonder if the pytorch on 0.3.0 version of container is gpu enabled? Any suggestion to fix this for using container version?
Thank you for your help, Rong Yao