cmusatyalab / mega-nerf-viewer

MIT License
44 stars 8 forks source link

Runtime error during dynamic octree refinement on multiple scenes #4

Open lenismerino opened 2 years ago

lenismerino commented 2 years ago

Following the instructions from the README page, I downloaded both the datasets and the pre-trained models for the building, rubble and quad scenes. I created the octrees and the results can be inspected on the compiled viewer (mouse rotation is a bit unconventional but it works). The initial state of the octrees produce a low-quality 3D model so when pressing the M key to start the refinement, the program always crashes when the number of split candidates reaches 0, the error message looks like this:

Split candidates: 3
Added: 3, total size: 5026078
Split candidates: 3
Added: 3, total size: 5026081
Split candidates: 3
Added: 3, total size: 5026084
Split candidates: 2
Added: 2, total size: 5026086
Split candidates: 2
Added: 2, total size: 5026088
Split candidates: 4
Added: 4, total size: 5026092
Split candidates: 3
Added: 3, total size: 5026095
Split candidates: 3
Added: 3, total size: 5026098
Split candidates: 4
Added: 4, total size: 5026102
Split candidates: 4
Added: 4, total size: 5026106
Split candidates: 2
Added: 2, total size: 5026108
Split candidates: 1
Added: 1, total size: 5026109
Split candidates: 1
Added: 1, total size: 5026110
Split candidates: 0
Sample candidates: 400906
terminate called after throwing an instance of 'torch::jit::JITException'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/mega_nerf/models/nerf.py", line 40, in forward
      xyz_dim0 = self.xyz_dim
      _6 = torch.format(_0, _5, expected, xyz_dim0)
      ops.prim.RaiseException(_6)
      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
      pass

Traceback of TorchScript, original code (most recent call last):
  File "/data/hturki/mega-nerf/mega_nerf/models/nerf.py", line 122, in forward

        if x.shape[1] != expected:
            raise Exception(
            ~~~~~~~~~~~~~~~~
                'Unexpected input shape: {} (expected: {}, xyz_dim: {})'.format(x.shape, expected, self.xyz_dim))
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        input_xyz = self.embedding_xyz(x[:, :self.xyz_dim])
RuntimeError: Unexpected input shape: [7, 3] (expected: 7, xyz_dim: 3)

Aborted (core dumped)

In this case seems that the provided input shape matches what is expected (and still there is a code crash) however in many of the cases I ran, the first value of the input shape is lower than the expected one.

input_xyz = self.embedding_xyz(x[:, :self.xyz_dim])
RuntimeError: Unexpected input shape: [1, 3] (expected: 7, xyz_dim: 3)

Aborted (core dumped)

Here is some additional info about my environment:

Any information and workaround for this issue is greatly appreciated.

IaroslavS commented 1 year ago

Following the instructions from the README page, I downloaded both the datasets and the pre-trained models for the building, rubble and quad scenes. I created the octrees and the results can be inspected on the compiled viewer (mouse rotation is a bit unconventional but it works). The initial state of the octrees produce a low-quality 3D model so when pressing the M key to start the refinement, the program always crashes when the number of split candidates reaches 0, the error message looks like this:

Split candidates: 3 Added: 3, total size: 5026078 Split candidates: 3 Added: 3, total size: 5026081 Split candidates: 3 Added: 3, total size: 5026084 Split candidates: 2 Added: 2, total size: 5026086 Split candidates: 2 Added: 2, total size: 5026088 Split candidates: 4 Added: 4, total size: 5026092 Split candidates: 3 Added: 3, total size: 5026095 Split candidates: 3 Added: 3, total size: 5026098 Split candidates: 4 Added: 4, total size: 5026102 Split candidates: 4 Added: 4, total size: 5026106 Split candidates: 2 Added: 2, total size: 5026108 Split candidates: 1 Added: 1, total size: 5026109 Split candidates: 1 Added: 1, total size: 5026110 Split candidates: 0 Sample candidates: 400906 terminate called after throwing an instance of 'torch::jit::JITException' what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/mega_nerf/models/nerf.py", line 40, in forward xyz_dim0 = self.xyz_dim _6 = torch.format(_0, _5, expected, xyz_dim0) ops.prim.RaiseException(_6)

    else:
      pass

Traceback of TorchScript, original code (most recent call last):
  File "/data/hturki/mega-nerf/mega_nerf/models/nerf.py", line 122, in forward

        if x.shape[1] != expected:
            raise Exception(
            'Unexpected input shape: {} (expected: {}, xyz_dim: {})'.format(x.shape, expected, self.xyz_dim))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

    input_xyz = self.embedding_xyz(x[:, :self.xyz_dim])

RuntimeError: Unexpected input shape: [7, 3] (expected: 7, xyz_dim: 3)

Aborted (core dumped)

In this case seems that the provided input shape matches what is expected (and still there is a code crash) however in many of the cases I ran, the first value of the input shape is lower than the expected one.

input_xyz = self.embedding_xyz(x[:, :self.xyz_dim]) RuntimeError: Unexpected input shape: [1, 3] (expected: 7, xyz_dim: 3)

Aborted (core dumped)

Here is some additional info about my environment:

* Computer: 8th gen 8-core i7, 64GB of RAM, GTX 1080Ti (12GB)

* OS: Ubuntu 20.04.3

* CMAKE: 3.23.1

* NVIDIA software: Driver 470.57.02, CUDA 11.4 cuDNN 8,

* The LibTorch version used for the viewer compilations is this one: https://download.pytorch.org/libtorch/cu113/libtorch-shared-with-deps-1.11.0%2Bcu113.zip (I don't know the impact of using the Pre-cxx11 ABI or the cxx11 ABI as it was not specified on the README),

* The Python environment was created according to the .yaml file

Any information and workaround for this issue is greatly appreciated.

I'm facing the same problem. It seems that we should stop refining before "Split candidates" == 0 .