CUDA error - Githubissues

MulinYu commented 1 year ago

Dear Radu,

Thanks for the code.

I tried training your model on my own dataset, which contains approximately 4,000 images. Unfortunately, I encountered the following error:

_File "/workspace/permuto_sdf/permuto_sdf/permuto_sdf_py/utils/nerf_utils.py", line 507, in create_samples fg_ray_samples_packed=occupancy_grid.compute_samples_in_occupied_regions(ray_origins, ray_dirs, ray_t_entry, ray_t_exit, hyperparams.min_dist_between_samples, hyperparams.max_nr_samples_per_ray, jittersamples) RuntimeError: CUDA error: an illegal memory access was encountered

Could you please help me identify the source of this error? I suspect it might be related to insufficient GPU memory. If that's the case, do you have any suggestions on how to resolve this issue?

Thank you in advance for your assistance.

Best regards, Mulin

RaduAlexandru commented 1 year ago

Hi Mulin,

Indeed it may be due to insufficient GPU memory, 4000 images is significantly more than what I usually train with. You can try to subsample the images to a lower resolution like shown here Alternatively, you can try to select a representative number of images instead of using the full 4000.

The code is designed to load all images in GPU memory in order to speed up the creation of rays so try to aim for something like ~200 images depending on your resolution and your available VRAM.

CanCanZeng commented 1 year ago

Hi @MulinYu , did you run this project successfully? I'm not familiar with docker, I'm not sure if the training command is from host or from docker, and how to read host dataset from docker.

RaduAlexandru commented 1 year ago

Hi @CanCanZeng ,

The training commands should be run within docker, so after running permuto_sdf/docker/run.sh In order to get the data inside the docker container you can mount one of your host volumes to be visible inside docker by adding something like --volume="<your_host_path>:<your_docker_path>:rw" in the .run.sh script. You'll see that there are already some paths mounted so you can just add your path there. For more info on mounting volume you can check this

Zvyozdo4ka commented 1 year ago

Indeed it may be due to insufficient GPU memory, 4000 images is significantly more than what I usually train with. You can try to subsample the images to a lower resolution like shown here Alternatively, you can try to select a representative number of images instead of using the full 4000.

The code is designed to load all images in GPU memory in order to speed up the creation of rays so try to aim for something like ~200 images depending on your resolution and your available VRAM.

I tried with 150 images, then decreased to 100 images, used `--img_subsample' from 2 to 14 and still this command doesn't work:

python3 run_custom_dataset.py --scene_scale 1.0 --scene_translation 0.0 0.0 0.0 --dataset_path /workspace/input_data/professor/ --img_subsample 14.0

I got two types of errors:

RuntimeError: CUDA out of memory. Tried to allocate 20.71 GiB (GPU 0; 9.77 GiB total capacity; 24.50 KiB already allocated; 8.66 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

5       0x7f2011ba0acd /workspace/easy_pbr/easypbr.cpython-38-x86_64-linux-gnu.so(+0x79acd) [0x7f2011ba0acd]
4       0x7f2011b8dbb7 /workspace/easy_pbr/easypbr.cpython-38-x86_64-linux-gnu.so(+0x66bb7) [0x7f2011b8dbb7]
3       0x7f2011b5fbfc /workspace/easy_pbr/easypbr.cpython-38-x86_64-linux-gnu.so(+0x38bfc) [0x7f2011b5fbfc]
2       0x7f201127cee1 easy_pbr::Viewer::Viewer(std::string) + 641
1       0x7f201127cc27 easy_pbr::Viewer::init_context() + 631
0       0x7f201127a2ae loguru::StreamLogger::~StreamLogger() + 126
[    AA639740]  Viewer.cxx:421   FATL| GLFW could not initialize

RaduAlexandru / permuto_sdf

CUDA error #9