ashawkey / torch-ngp

A pytorch CUDA extension implementation of instant-ngp (sdf and nerf), with a GUI.
MIT License
2.1k stars 274 forks source link

maybe hashmap size of each layer should equal to `resolution ** input_dim` instead of `(resolution + 1) ** input_dim`? #71

Closed Karbo123 closed 2 years ago

Karbo123 commented 2 years ago

First, I really appreciate this great work! I recently tried to understand the cuda codes you wrote, and maybe found a small bug, I'm not sure whether it's a bug or made intentionally.

I found the hashmap size of each hashgrid layer is written as (resolution + 1) ** input_dim in your python codes. But I later read its cuda codes, and found here defines the a variable scale as exp2f(level * S) * H - 1.0f, if I understood correctly, it's base_resolution * pow(per_level_scale, level) - 1.0.

If I'm not wrong, let's say we only consider a 2d grid volume with resolution==4, which can be visualized as: image Then the cuda codes tell us const float scale = 3.0f, and const uint32_t resolution = 4. And the following cuda codes try to compute the above 2d grid indices (0<=i<=3), and normalized relative coordinates (within -0.5 ~ +0.5) for each query point given by input pointer. More concretely, the pos variable for the first bin, normalized coordinate should range within [0.5, 1), and the two middle coordinates range within (0.0, 1.0), and the last one is within (0.0, 0.5]. Note that, floor(x + 0.5) is actually round(x).

And I also found that in their latest tiny-cuda-nn implementation, he used stride *= grid_resolution instead of the one stride *= (resolution + 1) used in your codes.

So, my question is, since that the maximum number of point locations of each layer can be computed as resolution ** input_dim, instead of the (resolution + 1) ** input_dim, maybe it's possible and better to shrink the allocated memory size as I suggest?

ashawkey commented 2 years ago

Thanks for your suggestion! In fact I have been confused by this too. It seems to be an aligning corner problem similar to the interpolation in pytorch.

Here is the reason why I use (resolution + 1) (Sorry for the low photo quality...): explain Since the inputs are in [0.5, 3.5], if we store data on the grid, we'll need grid data at [0, 1, 2, 3, 4] which is resolution + 1.

I still can't figure out how to correctly interpolate all locations if we use the resolution in the original implementation (very likely I'm wrong...). For example, for a point at x = 3.2 (0.9 * 3 + 0.5), we get pos_grid = 3, pos = 0.2, and in this line we'll get a pos_grid_local= 4, but this is out-of-range for resolution = 4 (only contains data in [0, 1, 2, 3]).

However, since the max number of params per layer is limited by 2^19, this +1 difference can be neglectable in use. By some simple experiments on LEGO, I found these two implementations achieve similar performance.

Looking forward to your reply!

Karbo123 commented 2 years ago

Yes, now I think you are right. The allocated memory should be (res + 1)^D.

But I still wonder why the instant-ngp author doesn't plus one when calculating the flatten 1d index?

And I also wonder why we can't just remove the + 0.5 in the codes, and if doing so, it seems everything would just work fine. (allocate 4^2, and won't be out of range)

Finally I have to say that, although the effect is very minor, it may only affect the border of the scene, and the effect may also further alleviate by the used MLP network.

ashawkey commented 2 years ago

You are right, we can remove the + 0.5 to achieve a correct corner-aligned interpolation. I added this option in https://github.com/ashawkey/torch-ngp/commit/79da0e7e07506a7b44cffb0db252a175dccf32a7 and you can have a try! Again, thanks for your time and effort!

Karbo123 commented 2 years ago

Thanks for the reply! It totally solved my problem.