CUDA error: an illegal memory access was encountered

CrescentRosexx commented 5 months ago

Thank you for your nice work! I was using wsl2 in win11, but I don't think that's the problem... So I got the CUDA error, I wonder what's going on?


/home/jx/miniconda3/envs/gs3d/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
  model = create_fn(
Using cache found in /home/jx/.cache/torch/hub/intel-isl_MiDaS_master
[1000, 2000, 3000, 5000, 10000]
Optimizing output/horns
Output folder: output/horns [08/01 13:59:54]
Reading camera 62/62 [08/01 13:59:54]
0it [00:00, ?it/s]6.323975610733033 cameras_extent [08/01 13:59:54]
Loading Training Cameras [08/01 13:59:54]
3it [00:00,  3.65it/s]
0it [00:00, ?it/s]Loading Test Cameras [08/01 13:59:55]
8it [00:01,  7.02it/s]
Number of points at initialisation :  37399 [08/01 14:00:01]
Training progress:   0%|          | 0/10000 [00:00<?, ?it/s]/home/jx/miniconda3/envs/gs3d/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The variance of predictions or target is close to zero. This can cause instability in Pearson correlationcoefficient, leading to wrong results. Consider re-scaling the input if possible or computing using alarger dtype (currently using torch.float32).
  warnings.warn(*args, **kwargs)  # noqa: B028
Traceback (most recent call last):
  File "/mnt/c/Programs/PyCharmWorkplace/FSGS-main/train.py", line 279, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args)
  File "/mnt/c/Programs/PyCharmWorkplace/FSGS-main/train.py", line 97, in training
    loss = ((1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image)))
  File "/mnt/c/Programs/PyCharmWorkplace/FSGS-main/utils/loss_utils.py", line 49, in ssim
    window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Training progress:   0%|          | 0/10000 [00:01<?, ?it/s]
ERROR conda.cli.main_run:execute(124): `conda run python train.py --source_path dataset/nerf_llff_data/horns --model_path output/horns --eval --n_views 3 --sample_pseudo_interval 1` failed. (See above for error)

Process finished with exit code 1```

leo-frank commented 5 months ago

I get the same problem on a naive Ubuntu 20.04 platform

runningpp commented 5 months ago

get the same problem

zehaozhu commented 5 months ago

Hello,

Thanks for your interest in our work.

You may refer to these issues https://github.com/graphdeco-inria/gaussian-splatting/issues/462 and https://github.com/graphdeco-inria/gaussian-splatting/issues/222

zehaozhu commented 5 months ago

@CrescentRosexx @leo-frank @runningpp This issue often happens for CUDA with low version. Try using CUDA 11.7.

lidonghaoharry commented 4 months ago

I get the same issue with CUDA 11.8. Has anyone figured out a solution yet? Thanks!

Penguin-jpg commented 4 months ago

I encountered this problem with CUDA 11.7 and 11.8. (I can run the original Gaussian splatting without issues.) Has this issue been solved?

lidonghaoharry commented 3 months ago

@Penguin-jpg I got it worked in Docker with CUDA 11.7.

chenkangjie1123 commented 3 months ago

@Penguin-jpg I got it worked in Docker with CUDA 11.7.

How did you use docker to solve this problem? I have never used docker before.

Penguin-jpg commented 3 months ago

I get the same issue with CUDA 11.8. Has anyone figured out a solution yet? Thanks!

I think you can try to install cuda by using "conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit".

CrescentRosexx commented 3 months ago

@ckjCEO @Penguin-jpg I got it worked by using conda to install a high version of pytorch, instead of using pip previously. I'm not sure if this is the reason but it just worked... I use the pytorch official command conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia

VITA-Group / FSGS

CUDA error: an illegal memory access was encountered #23