CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown)

windingwind commented 2 years ago

Hi! I met this CUDA error while running render_to_screen.sh: CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)" render_to_screen.sh: line 3: 28062 Segmentation fault (core dumped) python run_nerf.py cfgs/paper/finetune/$DATASET.yaml -rcfg cfgs/render/render_to_screen.yaml

I'm running kilonerf on Ubuntu18.04, CUDA11.1, GPU is A6000. Could you please help me with this? Thank you very much!

Here's the output:

(kilonerf) nesc525@nesc525:~/drivers/5/kilonerf$ bash render_to_screen.sh
auto log path: logs/paper/finetune/Synthetic_NeRF_Lego
{'checkpoint_interval': 50000, 'chunk_size': 40000, 'distilled_cfg_path': 'cfgs/paper/distill/Synthetic_NeRF_Lego.yaml', 'distilled_checkpoint_path': 'logs/paper/distill/Synthetic_NeRF_Lego/checkpoint.pth', 'initial_learning_rate': 0.001, 'iterations': 1000000, 'l2_regularization_lambda': 1e-06, 'learing_rate_decay_rate': 500, 'no_batching': True, 'num_rays_per_batch': 8192, 'num_samples_per_ray': 384, 'occupancy_cfg_path': 'cfgs/paper/pretrain_occupancy/Synthetic_NeRF_Lego.yaml', 'occupancy_log_path': 'logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth', 'perturb': 1.0, 'precrop_fraction': 0.5, 'precrop_iterations': 0, 'raw_noise_std': 0.0, 'render_only': False, 'no_color_sigmoid': False, 'render_test': True, 'render_factor': 0, 'testskip': 8, 'deepvoxels_shape': 'greek', 'blender_white_background': True, 'blender_half_res': False, 'llff_factor': 8, 'llff_no_ndc': False, 'llff_lindisp': False, 'llff_spherify': False, 'llff_hold': False, 'print_interval': 100, 'render_testset_interval': 10000, 'render_video_interval': 100000000, 'network_chunk_size': 65536, 'rng_seed': 0, 'use_same_initialization_for_all_networks': False, 'use_initialization_fix': False, 'num_importance_samples_per_ray': 0, 'model_type': 'multi_network', 'random_direction_probability': -1, 'von_mises_kappa': -1, 'view_dependent_dropout_probability': -1}
Using GPU: RTX A6000
/home/nesc525/drivers/5/kilonerf/utils.py:254: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array([[float(w) for w in line.strip().split()] for line in open(path)]).astype(np.float32)
Loaded a NSVF-style dataset (138, 800, 800, 4) (138, 4, 4) (0,) data/nsvf/Synthetic_NeRF/Lego
(100,) (13,) (25,)
Converting alpha to white.
global_domain_min: [-0.67 -1.2  -0.37], global_domain_max: [0.67 1.2  1.03], near: 2.0, far: 6.0, background_color: tensor([1., 1., 1.])
Loading logs/paper/finetune/Synthetic_NeRF_Lego/checkpoint_1000000.pth
Loading occupancy grid from logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth
CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)" 
render_to_screen.sh: line 3: 28062 Segmentation fault      (core dumped) python run_nerf.py cfgs/paper/finetune/$DATASET.yaml -rcfg cfgs/render/render_to_screen.yaml

windingwind commented 2 years ago

Following the suggestion here: https://forums.developer.nvidia.com/t/cudaerrorunknown-cudagraphicsglregisterbuffer/64406/12

After adding __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia The output shows: (I tried with and without these enviroment values on different GPUs, including A6000 and 3090)

(kilonerf) nesc525@nesc525:~/drivers/5/kilonerf$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia CUDA_VISIBLE_DEVICES=1 bash render_to_screen.sh 
auto log path: logs/paper/finetune/Synthetic_NeRF_Lego
{'checkpoint_interval': 50000, 'chunk_size': 40000, 'distilled_cfg_path': 'cfgs/paper/distill/Synthetic_NeRF_Lego.yaml', 'distilled_checkpoint_path': 'logs/paper/distill/Synthetic_NeRF_Lego/checkpoint.pth', 'initial_learning_rate': 0.001, 'iterations': 1000000, 'l2_regularization_lambda': 1e-06, 'learing_rate_decay_rate': 500, 'no_batching': True, 'num_rays_per_batch': 8192, 'num_samples_per_ray': 384, 'occupancy_cfg_path': 'cfgs/paper/pretrain_occupancy/Synthetic_NeRF_Lego.yaml', 'occupancy_log_path': 'logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth', 'perturb': 1.0, 'precrop_fraction': 0.5, 'precrop_iterations': 0, 'raw_noise_std': 0.0, 'render_only': False, 'no_color_sigmoid': False, 'render_test': True, 'render_factor': 0, 'testskip': 8, 'deepvoxels_shape': 'greek', 'blender_white_background': True, 'blender_half_res': False, 'llff_factor': 8, 'llff_no_ndc': False, 'llff_lindisp': False, 'llff_spherify': False, 'llff_hold': False, 'print_interval': 100, 'render_testset_interval': 10000, 'render_video_interval': 100000000, 'network_chunk_size': 65536, 'rng_seed': 0, 'use_same_initialization_for_all_networks': False, 'use_initialization_fix': False, 'num_importance_samples_per_ray': 0, 'model_type': 'multi_network', 'random_direction_probability': -1, 'von_mises_kappa': -1, 'view_dependent_dropout_probability': -1}
Using GPU: GeForce RTX 3090
/home/nesc525/drivers/5/kilonerf/utils.py:254: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array([[float(w) for w in line.strip().split()] for line in open(path)]).astype(np.float32)
Loaded a NSVF-style dataset (138, 800, 800, 4) (138, 4, 4) (0,) data/nsvf/Synthetic_NeRF/Lego
(100,) (13,) (25,)
Converting alpha to white.
global_domain_min: [-0.67 -1.2  -0.37], global_domain_max: [0.67 1.2  1.03], near: 2.0, far: 6.0, background_color: tensor([1., 1., 1.])
Loading logs/paper/finetune/Synthetic_NeRF_Lego/checkpoint_1000000.pth
Loading occupancy grid from logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  31
  Current serial number in output stream:  32

windingwind commented 2 years ago

I just turned to another machine(Ubuntu 20.04, NVIDIA-SMI 460.67, CUDA Version: 11.2, RTX3090) and run bash render_to_screen.sh. The error infomation turns out to be the same.

The error seems to be related to the GLUT. However, I tested GLUT with a ray tracing code and visualized the result: everything seems to be fine, except the kilonerf render code. TAT

Quyans commented 2 years ago

I met the same question. it sames like the author write the absolute address of his computer in the CUDA extention. since we dont have the /home/chris/anti

Quyans commented 2 years ago

hey I just made it. I just used a physical monitor which is connected to the GPU。 i guess it is not allowed to use it remote.

windingwind commented 2 years ago

hey I just made it. I just used a physical monitor which is connected to the GPU。 i guess it is not allowed to use it remote.

i tried on a phisical monitor, the same error

Quyans commented 2 years ago

did u connected the monitor to the Integrated graphics card? u r supposed to connect the Discrete graphics card directly

windingwind commented 2 years ago

did u connected the monitor to the Integrated graphics card? u r supposed to connect the Discrete graphics card directly

it was connected to the a6000. i’ll try another gpus later! thanks!

Quyans commented 2 years ago

已收到邮件

Ataraxiaecho commented 10 months ago

Hello, I'm currently experiencing the same problem, how did you solve it?

creiser / kilonerf

CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) #22