Test `Renderer` integration

a-lemus96 commented 3 months ago

Perform a quick run to verify system functionality. In case something fails, report bugs as they appear and describe how they were addressed in case of hotfixes.

a-lemus96 commented 3 months ago

Solved the following error by setting resolution to 128 in kwargs for renderer instantiation and moved code for resolution to grid_nlevels key.

❯ python run-nerf.py --debug
Device: NVIDIA GeForce RTX 3090
Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 534, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 429, in main
    model, renderer, lpips_net = init_models(train_set)
                                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 90, in init_models
    'grid_nlevels': grid_nlevels}
                    ^^^^^^^^^^^^
NameError: name 'grid_nlevels' is not defined

a-lemus96 commented 3 months ago

When running python run-nerf.py --debug the following error occurs:

Device: NVIDIA GeForce RTX 3090
Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 534, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 429, in main
    model, renderer, lpips_net = init_models(train_set)
                                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 91, in init_models
    renderer = R.Renderer(near, far, chunksize, white_bkgd, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 40, in __init__
    self.bkgd = torch.where(white_bkgd, light, dark)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: where() received an invalid combination of arguments - got (bool, Tensor, Tensor), but expected one of:
 * (Tensor condition)
 * (Tensor condition, Tensor input, Tensor other, *, Tensor out)
 * (Tensor condition, Number self, Tensor other)
      didn't match because some of the arguments have invalid types: (bool, Tensor, Tensor)
 * (Tensor condition, Tensor input, Number other)
      didn't match because some of the arguments have invalid types: (bool, Tensor, Tensor)
 * (Tensor condition, Number self, Number other)
      didn't match because some of the arguments have invalid types: (bool, Tensor, Tensor)

The problem was solved by substituting line 40 with

self.bkgd = light if white_bkgd else dark

a-lemus96 commented 3 months ago

Due to the error msg

AttributeError: 'Renderer' object has no attribute 'to'. Did you mean: 'tf'?

I had to create to method in Renderer ADT for transferring OccGridEstimator to device

a-lemus96 commented 3 months ago

Add to method to Renderer ADT

This is intended to solve:

Due to the error msg
AttributeError: 'Renderer' object has no attribute 'to'. Did you mean: 'tf'?
I had to create to method in Renderer ADT for transferring OccGridEstimator to device

a-lemus96 commented 3 months ago

Solve TypeError: 'bool' object is not callable for Renderer

❯ python run-nerf.py --debug
Device: NVIDIA GeForce RTX 3090
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torchvision/models/_utils.py:208: UserWarning:

The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.

/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torchvision/models/_utils.py:223: UserWarning:

Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.

Loading model from: /home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/lpips/weights/v0.1/vgg.pth
[NeRF]:   0%|                                                                                                                      | 0/8000 [00:05<?, ?it/s]
Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 534, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 434, in main
    train(
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 225, in train
    renderer.train()
TypeError: 'bool' object is not callable

a-lemus96 commented 3 months ago

Solve renderer attribute error:

  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 87, in render_rays
    near_plane=self.near,
               ^^^^^^^^^
AttributeError: 'Renderer' object has no attribute 'near'

a-lemus96 commented 3 months ago

Solve CUDA INTERNAL ASSET FAILED error

Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 534, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 434, in main
    train(
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 234, in train
    render_output = renderer.render_rays(rays_o, rays_d, model)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 82, in render_rays
    ray_idxs, t_starts, t_ends = self.estimator.sampling(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/nerfacc/estimators/occ_grid.py", line 164, in sampling
    intervals, samples, _ = traverse_grids(
                            ^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/nerfacc/grid.py", line 158, in traverse_grids
    t_mins, t_maxs, hits = ray_aabb_intersect(rays_o, rays_d, aabbs)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/nerfacc/grid.py", line 43, in ray_aabb_intersect
    t_mins, t_maxs, hits = _C.ray_aabb_intersect(
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h":25, please report a bug to PyTorch.

a-lemus96 commented 3 months ago

Encountered the following error

Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 535, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 435, in main
    train(
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 268, in train
    loss.backward()
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

a-lemus96 commented 3 months ago

Performing a full training loop with final validation computation and video rendering using wandb.

Command being run is python run-nerf.py --n_iters=1000

a-lemus96 commented 3 months ago

Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 540, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 467, in main
    val_metrics = validation(
                  ^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 132, in validation
    rgbs, _ = renderer.render_poses((H, W, focal),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 136, in render_poses
    rays_o, rays_d = utils.get_rays(H, W, focal, poses, device)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/utils/utilities.py", line 77, in get_rays
    dirs_w = torch.sum(dirs * poses, axis=-1)
                       ~~~~~^~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

a-lemus96 commented 3 months ago

Previous solution addressed in #69 causes problem when rendering poses as there are cases where rays do not intersect grid but anyway should be considered as valid tensors in the final rendered image.

Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 540, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 467, in main
    val_metrics = validation(
                  ^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 132, in validation
    rgbs, _ = renderer.render_poses((H, W, focal),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 154, in render_poses
    (rgb, _, depth, _), _ = self.render_rays(rays_o, rays_d, model)

a-lemus96 commented 3 months ago

Performance concerns, the number of iterations per sec is 10x slower than previous version.

a-lemus96 commented 3 months ago

Traceback (most recent call last):
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 541, in <module>
    main()
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 441, in main
    train(
  File "/home/lemus/projects/fs-nerf/src/run-nerf.py", line 301, in train
    val_metrics = validation(
                  ^^^^^^^^^^^
TypeError: validation() takes 6 positional arguments but 7 were given

a-lemus96 commented 3 months ago

Unpacking error for render_poses

  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 151, in render_poses
    (rgb, _, depth, _), _ = self.render_rays(rays_o, rays_d, model)
    ^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

a-lemus96 commented 3 months ago

Assertion error: sigmas must have shape of (N,)! Got torch.Size([])

  File "/home/lemus/projects/fs-nerf/src/render/renderer.py", line 105, in render_rays
    data = rendering(t_starts, t_ends, ray_idxs, n_rays=len(rays_o),
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lemus/miniconda3/envs/nerf/lib/python3.11/site-packages/nerfacc/volrend.py", line 97, in rendering
    sigmas.shape == t_starts.shape
AssertionError: sigmas must have shape of (N,)! Got torch.Size([])

a-lemus96 / fs-nerf

Test `Renderer` integration #61