Open Deroline10 opened 5 months ago
Issue While working on a project that involved intensive computation, I encountered a persistent issue: RuntimeError: CUDA error: unknown error. This error occurred when I was using an NVIDIA GeForce 1060 6G GPU, which seemed inadequate for the tasks I was running.
Solution To address the problem, I decided to upgrade my hardware. I opted to rent a server equipped with a more powerful GPU, specifically the NVIDIA GeForce 3090 24G. This change provided a significant improvement in performance.
Here is my workflow, I hope it will be helpful to everyone.
Step 1: Create the Environment
conda create -n CB python=3.7 Step 2: Activate the Environment
conda activate CB Step 3: Install CUDA 11.7
conda install -c conda-forge cudatoolkit=11.7 Step 4: Install CUDA-Related Components
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia Step 5: Verify the Installation
python -c "import torch; print(torch.cuda.is_available())" Step 6: Install CellBender
pip install cellbender Step 7: Ensure HTML Report Generation Capability
pip install lxml_html_clean
(clean_cellbender) derry@PC-20231102INBT:/mnt/d/cellranger$ cellbender remove-background --cuda --input raw_feature_bc_matrix.h5 --output test_output.h5 --epochs 10 cellbender:remove-background: Command: cellbender remove-background --cuda --input raw_feature_bc_matrix.h5 --output test_output.h5 --epochs 10 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash 504ba9439b) cellbender:remove-background: 2024-06-10 10:09:21 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 33696 Gene Expression cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 25615 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 1964 cellbender:remove-background: Prior on counts for empty droplets is 66 cellbender:remove-background: Excluding 8574 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 17041 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 33 cellbender:remove-background: Using 5926 probable cell barcodes, plus an additional 12999 barcodes, and 63533 empty droplets. cellbender:remove-background: Largest surely-empty droplet has 72 UMI counts. cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmp_h3larh4 cellbender:remove-background: Successfully unpacked tarball to /tmp/tmp_h3larh4 /tmp/tmp_h3larh4/75bc50b6be_train.loaderstate /tmp/tmp_h3larh4/75bc50b6be_test.loaderstate /tmp/tmp_h3larh4/75bc50b6be_random.pyro /tmp/tmp_h3larh4/75bc50b6be_optim.torch /tmp/tmp_h3larh4/posterior.h5 /tmp/tmp_h3larh4/75bc50b6be_optim.pyro /tmp/tmp_h3larh4/75bc50b6be_args.npy /tmp/tmp_h3larh4/75bc50b6be_random.cuda /tmp/tmp_h3larh4/75bc50b6be_params.pyro /tmp/tmp_h3larh4/75bc50b6be_model.torch cellbender:remove-background: Workflow hash does not match that of checkpoint. cellbender:remove-background: No checkpoint loaded. cellbender:remove-background: Running inference... cellbender:remove-background: [epoch 001] average training loss: 3500.7271 cellbender:remove-background: [epoch 002] average training loss: 2679.1638 (36.4 seconds per epoch) cellbender:remove-background: Will not checkpoint due to projected run completion in under 7.0 min cellbender:remove-background: [epoch 003] average training loss: 2639.3804 cellbender:remove-background: [epoch 004] average training loss: 2630.4355 cellbender:remove-background: [epoch 005] average training loss: 2620.5316 cellbender:remove-background: [epoch 005] average test loss: 2615.8788 cellbender:remove-background: [epoch 006] average training loss: 2604.2858 cellbender:remove-background: [epoch 007] average training loss: 2582.9639 cellbender:remove-background: [epoch 008] average training loss: 2578.5009 cellbender:remove-background: [epoch 009] average training loss: 2571.8616 cellbender:remove-background: [epoch 010] average training loss: 2569.8371 cellbender:remove-background: [epoch 010] average test loss: 2571.3370 cellbender:remove-background: Saving a checkpoint... cellbender:remove-background: Saved checkpoint as /mnt/d/cellranger/ckpt.tar.gz cellbender:remove-background: 2024-06-10 10:18:59 cellbender:remove-background: Inference procedure complete. cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmplylrvm7q cellbender:remove-background: Successfully unpacked tarball to /tmp/tmplylrvm7q /tmp/tmplylrvm7q/504ba9439b_args.npy /tmp/tmplylrvm7q/504ba9439b_optim.torch /tmp/tmplylrvm7q/504ba9439b_train.loaderstate /tmp/tmplylrvm7q/504ba9439b_random.cuda /tmp/tmplylrvm7q/504ba9439b_params.pyro /tmp/tmplylrvm7q/504ba9439b_optim.pyro /tmp/tmplylrvm7q/504ba9439b_model.torch /tmp/tmplylrvm7q/504ba9439b_random.pyro /tmp/tmplylrvm7q/504ba9439b_test.loaderstate cellbender:remove-background: Posterior not currently included in checkpoint. cellbender:remove-background: Computing posterior noise count probabilities in mini-batches. cellbender:remove-background: Working on chunk (1/99) cellbender:remove-background: [0.09 mins per chunk] cellbender:remove-background: Working on chunk (2/99) cellbender:remove-background: Working on chunk (3/99) ...... cellbender:remove-background: Working on chunk (96/99) cellbender:remove-background: Working on chunk (97/99) cellbender:remove-background: Working on chunk (98/99) cellbender:remove-background: Working on chunk (99/99) cellbender:remove-background: Writing full posterior to test_output_posterior.h5 cellbender:remove-background: Succeeded in writing posterior to file test_output_posterior.h5 cellbender:remove-background: Added posterior object to checkpoint file. cellbender:remove-background: 2024-06-10 10:32:51
cellbender:remove-background: Saved summary plots as test_output.pdf cellbender:remove-background: Saved cell barcodes in test_output_cell_barcodes.csv cellbender:remove-background: **Computing target noise counts per gene for MCKP estimator Traceback (most recent call last): File "/home/derry/miniconda3/envs/clean_cellbender/bin/cellbender", line 10, in
sys.exit(main())
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/base_cli.py", line 118, in main
cli_dict[args.tool].run(args)
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 185, in run
return main(args)
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/cli.py", line 230, in main
posterior = run_remove_background(args)
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/run.py", line 133, in run_remove_background
file_name=file_name,
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/run.py", line 237, in compute_output_denoised_counts_reports_metrics
per_gene=True,
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/posterior.py", line 1579, in compute_mean_target_removal_as_function
device=device,
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/estimation.py", line 147, in estimate_noise
device=device)
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/estimation.py", line 822, in apply_function_dense_chunks
s = fun(dense_tensor, kwargs)
File "/home/derry/miniconda3/envs/clean_cellbender/lib/python3.7/site-packages/cellbender/remove_background/estimation.py", line 143, in _torch_mean
return torch.matmul(x.exp(), c.t())
RuntimeError: CUDA error: unknown error