NVIDIAGameWorks / kaolin-wisp

NVIDIA Kaolin Wisp is a PyTorch library powered by NVIDIA Kaolin Core to work with neural fields (including NeRFs, NGLOD, instant-ngp and VQAD).
Other
1.46k stars 131 forks source link

Impossible to use GUI #178

Closed ArpegorPSGH closed 1 year ago

ArpegorPSGH commented 1 year ago

Hello, I've installed Kaolin-wisp a few days ago, on Ubuntu 20.04 (since it crashes during compilation on Windows). I followed exactly the Quickstart installation (I literally copy-pasted the content). I ran across several problems, which I managed to solve, but now I have been stuck for a day on a non-recurrent error, which almost never happens exactly at the same iteration, but always on the same line of code. This error only happens when I try to launch the app with the GUI, otherwise everything works fine (however it is quite slow, needing 20-30s for validating one FHD image). Note that even when the GUI is successfully launched, the refresh rate is over a second, and nothing in the GUI responds (I can click on buttons or change values in fields, but nothing happens afterwards). Here is the console log :

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia python app/nerf/main_nerf.py --dataset-path training_data/fox --config app/nerf/configs/nerf_hash.yaml
apex import failed. apex optimizer will not be available
blas
  constructor: OctreeAS.make_dense
  level: 7
grid
  constructor: HashGrid.from_geometric
  feature_dim: 2
  num_lods: 16
  multiscale_type: cat
  feature_std: 1e-09
  feature_bias: 0.0
  codebook_bitwidth: 19
  min_grid_res: 16
  max_grid_res: 1024
nef
  constructor: NeuralRadianceField
  pos_embedder: none
  view_embedder: positional
  pos_multires: 10
  view_multires: 4
  position_input: False
  activation_type: relu
  layer_type: linear
  hidden_dim: 64
  num_layers: 1
  bias: True
  prune_density_decay: 0.95
  prune_min_density: 2.956033378250884
tracer
  constructor: PackedRFTracer
  raymarch_type: uniform
  num_steps: 512
  step_size: 1.0
  bg_color: (0.0, 0.0, 0.0)
dataset
  constructor: NeRFSyntheticDataset
  dataset_path: training_data/fox
  split: train
  bg_color: (0.0, 0.0, 0.0)
  mip: 0
  dataset_num_workers: -1
  transform: None
dataset_transform
  constructor: SampleRays
  num_samples: 4096
trainer
  optimizer
    constructor: Adam
    lr: 0.001
    betas: (0.9, 0.999)
    eps: 1e-16
    weight_decay: 1e-06
  dataloader
    batch_size: 1
    num_workers: 0
  exp_name: nerf-hash
  mode: train
  max_epochs: 10
  save_every: -1
  save_as_new: False
  model_format: full
  render_every: -1
  valid_every: -1
  valid_split: test
  enable_amp: True
  profile_nvtx: False
  grid_lr_weight: 500.0
  scheduler: True
  scheduler_milestones: (0.5, 0.75, 0.9)
  scheduler_gamma: 0.333
  valid_metrics: ('psnr',)
  start_prune: 1000
  prune_every: 100
  random_lod: False
  rgb_lambda: 1.0
  opacity_loss: 0.0
  rgb_loss_type: huber
  rgb_loss_denom: rays
  target_sample_size: 262144
  save_valid_imgs: False
tracker
  tensorboard
    constructor: _Tensorboard
    log_dir: _results/logs/runs
    exp_name: None
    log_fname: None
  wandb
    constructor: _WandB
    entity: None
    project: wisp-nerf
    group: None
    run_name: None
    job_type: train
    sync_tensorboard: True
  visualizer
    constructor: OfflineRenderer
    render_res: (1024, 1024)
    render_batch: 10000
    shading_mode: rb
    matcap_path: ./data/matcap/Pearl.png
    shadow: False
    ao: False
    perf: False
  vis_camera
    camera_origin: (-3.0, 0.65, -3.0)
    camera_lookat: (0.0, 0.0, 0.0)
    camera_fov: 30.0
    camera_clamp: (0.0, 10.0)
    viz360_num_angles: 20
    viz360_radius: 3.0
    viz360_render_all_lods: False
  enable_tensorboard: True
  enable_wandb: False
  log_dir: _results/logs/runs
log_level: 20
pretrained: None
device: cuda
interactive: True
loading data: 100%|████████████████████████████| 33/33 [00:00<00:00, 272.54it/s]
2023-09-25 16:51:45,977|    INFO| WARNING: The dataset expects distortion correction, but the current implementation does not handle this.
/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2023-09-25 16:51:48,247|    INFO| Using NVIDIA RTX A4000 Laptop GPU with CUDA v11.3
2023-09-25 16:51:48,247|    INFO| Total number of parameters: 11431941
[i] Using PYGLFW_IMGUI (GL 3.3)
2023-09-25 16:51:48,638|    INFO| [i] Using PYGLFW_IMGUI (GL 3.3)
[i] Running at 60 frames/second
2023-09-25 16:51:48,654|    INFO| [i] Running at 60 frames/second
Traceback (most recent call last):
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/app/nerf/main_nerf.py", line 133, in <module>
    app.run()  # Run in interactive mode
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 267, in run
    app.run()   # App clock should always run as frequently as possible (background tasks should not be limited)
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/glumpy/app/__init__.py", line 362, in run
    run(duration, framecount)
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/glumpy/app/__init__.py", line 344, in run
    count = __backend__.process(dt)
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/glumpy/app/window/backends/backend_glfw_imgui.py", line 448, in process
    window.dispatch_event('on_draw', dt)
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/glumpy/app/window/event.py", line 396, in dispatch_event
    if getattr(self, event_type)(*args):
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 557, in on_draw
    self.render()     # Render objects uploaded to GPU
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 36, in _enable_amp
    return func(self, *args, **kwargs)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 525, in render
    img, depth_img = self.render_canvas(self.render_core, dt, self.canvas_dirty)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 414, in render_canvas
    renderbuffer = render_core.render(time_delta, force_render)
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/core/render_core.py", line 31, in _enable_amp
    return func(self, *args, **kwargs)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/core/render_core.py", line 223, in render
    rb = self._render_payload(payload, force_render)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/core/render_core.py", line 342, in _render_payload
    rb = renderer.render(in_rays)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/renderer/core/renderers/radiance_pipeline_renderer.py", line 71, in render
    rb += self.tracer(self.nef,
  File "/home/greau-hamard/anaconda3/envs/wisp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/tracers/base_tracer.py", line 161, in forward
    rb = self.trace(nef, rays, requested_channels, requested_extra_channels, **input_args)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/tracers/packed_rf_tracer.py", line 117, in trace
    raymarch_results = nef.grid.raymarch(rays,
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/models/grids/hash_grid.py", line 236, in raymarch
    return self.blas.raymarch(rays, raymarch_type=raymarch_type, num_samples=num_samples, level=self.blas.max_level)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/accelstructs/octree_as.py", line 427, in raymarch
    raymarch_results = self._raymarch_uniform(rays=rays, num_samples=num_samples, level=level)
  File "/home/greau-hamard/Téléchargements/kaolin-wisp/wisp/accelstructs/octree_as.py", line 356, in _raymarch_uniform
    results = wisp_C.ops.uniform_sample_cuda(scale, filtered_ridx.contiguous(), filtered_depth, insum)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The exact function causing the crash is implemented in a Shared Object library, so I have no idea how it works exactly. I checked the data fed into the function, and I do not see any difference between running with or without GUI, and the data shape corresponds to what is expected. I also checked that there is no GPU memory or capacity limitation at the moment of the crash. I do not have any other ideas on what to look at, given the generality of the error message, so could you help me find where the problem comes from?

ArpegorPSGH commented 1 year ago

I decided to use NerfStudio instead, which seems even better, and works almost straight away.