Bounds for points input to PermutoEncoding and OccupancyGrid

jingsenzhu commented 1 year ago

Hello, First of all, thank you for your excellent paper and code release!

I'm now trying to run your code on my customized scenes, so I need to normalize my scenes to fit within the bounds. Should the point positions be bounded within [0, 1] or [-1, 1] or other ranges? Thank you!

RaduAlexandru commented 1 year ago

Hello,

Currently I am assuming your object of interest is at the origin and fits inside a sphere of radius 0.5. So I think the easiest way it to scale your scene to fit in range [-0.5, 0.5]. The following image should show how a typical scene looks like with the object of interest in yellow inside the sphere and the cameras in red that can potentially be outside of the sphere.

scale

Also I recognize that at the moment it's probably not very easy to run on customized scenes. I plan to add soon a more detail readme section about that. The main part of running on a new scene is filling in the camera frames that you will use similar to what is returned by get_all_frames() of each dataloader as seen here. For this you can follow these general steps:

Each Frame is an object from the EasyPBR package that you can just instantiate in python with frame=Frame()
For each frame you need to fill in:
- frame.width=
- frame.height=
- frame.rgb_32f = which you can read with img=easypbr.Mat("path_to_img") followed by img.to_cv32f()
- frame.K =
- frame.tf_cam_world.from_matrix( )
Add each one of your created frames to a list my_frame_list.append(frame)
Finally use tensor_reel=data_loaders.MiscDataFuncs.frames2tensors(my_frame_list) to get compacted tensors that can be used for the train function a shown here

I hope I didn't forget anything but I will update the readme soon explaining this part with more detail. Also please pull the latest version of easy_pbr as some things have changed there.

Either way let me know if you encounter any issues along the way :)

jingsenzhu commented 1 year ago

Thank you very much for such a detailed instruction. I have 2 further questions about the implementation details:

The scene is bounded by a sphere, while the encoding and occupancy grids have a cube/box shape. Does this bounding sphere representation waste the spaces in the corner of the box?
In your ray intersection code, you compute the near and far t value by performing ray-sphere intersection. However, if the camera origin is within the bounding sphere (which happens in my customized scenes), the near t value will be negative and samples behind the camera may be collected. I think a simple clamp-to-zero operation should work to address this issue.

RaduAlexandru commented 1 year ago

Thanks a lot for your input!

To clarify, the PermutoEncoding can process points regardless of the range. It might work better or worse depending on the coarsest_scale and finest_scale parameters but it will provide encodings for any kind of input range. The occupancy grid is however a cubical grid and with the default parameters, it expects points in range [-0.5,0.5]. The scene is bounded by a sphere with radius 0.5. I chose a sphere bounding primitive for the scene since it allows for easy parametrization of the background following the NeRF++ approach. As you pointed out, this will case some voxels from the corners of the occupancy grid to never be accessed but I think this wasted space is not too much of an issue. In the end it's more important to not waste too much of the network capacity optimizing SDF at the edges of bounding primitive so having a sphere with a tight fit around your object is more beneficial.
That's a great point that I haven't considered until now. Indeed the ray_t_entry could end up being negative and therefore create ray samples behind the camera which is not intended. I added a clamp-to-zero to fix this. One thing to consider in this regard is that if you have the camera inside the bounding sphere you will create ray samples starting directly from the camera origin. The region directly in front of the camera is typically not viewed by other cameras so it can be a under-constrained region. If this causes you any issues, I recommend to maybe clamp your ray_t_entry to a slightly positive number so that you start creating ray samples from slightly in front of your camera. In general I tried to scale my scene so that the cameras are outside of the bounding sphere and therefore I ensure that all the rays are created within a well-constrained region which is visible by all cameras.

jingsenzhu commented 1 year ago

Thank you for your explanation and fixing! A slight further comment: You said that the sphere bounding is selected due to the NeRF++ spherical parameterization, that makes sense. In a recent research (MERF), it designs an interesting background parameterization which supports cube shape. So I believe in future works a bounding box can be adopted for the full occupation of the voxel fields :D

RaduAlexandru / permuto_sdf

Bounds for points input to PermutoEncoding and OccupancyGrid #2