POSTECH-CVLab / PeRFception

[NeurIPS2022] Official implementation of PeRFception: Perception using Radiance Fields.
Apache License 2.0
328 stars 16 forks source link

ScanNet data generation #18

Open sangrockEG opened 1 year ago

sangrockEG commented 1 year ago

First of all, thank you for publishing your implementation.

I want to generate the ScanNet dataset using the learned weights. For this, from the huggingface, I downloaded the files including last.ckpt.

Then, using the demo code, I tried to render the images of the first scene (scene0000_00). For rendering without additional training or evaluation, I slightly modified the final block of scannet.gin as follows:

run.run_render = True
run.run_train = False
run.run_eval = False

After that, I run the demo code with

python -m run --ginc configs/scannet.gin --scene_name scene0000_00

However, when I run the demo code, it seems taking too much memory and returns the following message.

Unable to allocate array with shape (1210619520, 3) and data type float64

This issue also had been mentioned by #11. The rendering loop (predict_step in /model/plenoxel_torch/ seems to sequentially render the image tensors and keep all of them on RAM. Maybe this part has better to be fixed for better accessibility of the dataset.

Anyway, in my case, I just picked one pose (frame_id=0) and rendered a single image. The code runs without error, but it returns an unexpected result. Fortunately, at least I can see the room-like shape (probably the room of scene0000_00, right?).


It seems that there is a pose-related problem. The following (intermediate) pose tensors might be helpful for figuring out what is wrong.

original pose (before processing with pcd-related things)

[[[-9.554210e-01  1.196160e-01 -2.699320e-01  2.655830e+00]
  [ 2.952480e-01  3.883390e-01 -8.729390e-01  2.981598e+00]
  [ 4.080000e-04 -9.137200e-01 -4.063430e-01  1.368648e+00]
  [ 0.000000e+00  0.000000e+00  0.000000e+00  1.000000e+00]]]

render_pose (the finally returned one)

[[[-9.80858835e-01  2.35084399e-18 -1.94721569e-01  2.96767746e-01]
  [-1.16803752e-07  9.99999718e-01 -7.10082718e-07  3.07291136e-02]
  [ 1.94722179e-01 -1.46270149e-17 -9.80858767e-01  1.29165942e+00]
  [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]]

I'm not very familiar with NeRF-related things, so the aforementioned trials might be wrong somewhere. Any help would be greatly appreciated.

Minhluu2911 commented 1 year ago

Have you try to use trans_info.npz to convert the pose. After loading pose from ScanNet convert it using the code below:

trans_info = np.load("path/to/trans_info.npz")
T = trans_info['T']
pcd_mean = trans_info['pcd_mean'] 
scene_scale = trans_info['scene_scale']
poses = T @ poses
poses[:, :3, 3] -= pcd_mean
poses[:, :3, 3] *= scene_scale
poses = poses.astype(np.float32)