ScanNet data generation

First of all, thank you for publishing your implementation.

I want to generate the ScanNet dataset using the learned weights. For this, from the huggingface, I downloaded the files including last.ckpt.

Then, using the demo code, I tried to render the images of the first scene (scene0000_00). For rendering without additional training or evaluation, I slightly modified the final block of scannet.gin as follows:

run.run_render = True
run.run_train = False
run.run_eval = False

After that, I run the demo code with

python -m run --ginc configs/scannet.gin --scene_name scene0000_00

However, when I run the demo code, it seems taking too much memory and returns the following message.

Unable to allocate array with shape (1210619520, 3) and data type float64

This issue also had been mentioned by #11. The rendering loop (predict_step in /model/plenoxel_torch/model.py) seems to sequentially render the image tensors and keep all of them on RAM. Maybe this part has better to be fixed for better accessibility of the dataset.

Anyway, in my case, I just picked one pose (frame_id=0) and rendered a single image. The code runs without error, but it returns an unexpected result. Fortunately, at least I can see the room-like shape (probably the room of scene0000_00, right?).

rendered_results_of_scene0000_00_0

It seems that there is a pose-related problem. The following (intermediate) pose tensors might be helpful for figuring out what is wrong.

original pose (before processing with pcd-related things)

[[[-9.554210e-01  1.196160e-01 -2.699320e-01  2.655830e+00]
  [ 2.952480e-01  3.883390e-01 -8.729390e-01  2.981598e+00]
  [ 4.080000e-04 -9.137200e-01 -4.063430e-01  1.368648e+00]
  [ 0.000000e+00  0.000000e+00  0.000000e+00  1.000000e+00]]]

render_pose (the finally returned one)

[[[-9.80858835e-01  2.35084399e-18 -1.94721569e-01  2.96767746e-01]
  [-1.16803752e-07  9.99999718e-01 -7.10082718e-07  3.07291136e-02]
  [ 1.94722179e-01 -1.46270149e-17 -9.80858767e-01  1.29165942e+00]
  [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]]

I'm not very familiar with NeRF-related things, so the aforementioned trials might be wrong somewhere. Any help would be greatly appreciated.

POSTECH-CVLab / PeRFception

ScanNet data generation #18