Closed Karbo123 closed 2 years ago
Thanks for your suggestion! In fact I have been confused by this too. It seems to be an aligning corner problem similar to the interpolation in pytorch.
Here is the reason why I use (resolution + 1)
(Sorry for the low photo quality...):
Since the inputs are in [0.5, 3.5]
, if we store data on the grid, we'll need grid data at [0, 1, 2, 3, 4]
which is resolution + 1
.
I still can't figure out how to correctly interpolate all locations if we use the resolution
in the original implementation (very likely I'm wrong...).
For example, for a point at x = 3.2
(0.9 * 3 + 0.5), we get pos_grid = 3, pos = 0.2
, and in this line we'll get a pos_grid_local= 4
, but this is out-of-range for resolution = 4
(only contains data in [0, 1, 2, 3]
).
However, since the max number of params per layer is limited by 2^19
, this +1
difference can be neglectable in use. By some simple experiments on LEGO, I found these two implementations achieve similar performance.
Looking forward to your reply!
Yes, now I think you are right. The allocated memory should be (res + 1)^D.
But I still wonder why the instant-ngp author doesn't plus one when calculating the flatten 1d index?
And I also wonder why we can't just remove the + 0.5
in the codes, and if doing so, it seems everything would just work fine. (allocate 4^2, and won't be out of range)
Finally I have to say that, although the effect is very minor, it may only affect the border of the scene, and the effect may also further alleviate by the used MLP network.
You are right, we can remove the + 0.5
to achieve a correct corner-aligned interpolation. I added this option in https://github.com/ashawkey/torch-ngp/commit/79da0e7e07506a7b44cffb0db252a175dccf32a7 and you can have a try!
Again, thanks for your time and effort!
Thanks for the reply! It totally solved my problem.
First, I really appreciate this great work! I recently tried to understand the cuda codes you wrote, and maybe found a small bug, I'm not sure whether it's a bug or made intentionally.
I found the hashmap size of each hashgrid layer is written as
(resolution + 1) ** input_dim
in your python codes. But I later read its cuda codes, and found here defines the a variablescale
asexp2f(level * S) * H - 1.0f
, if I understood correctly, it'sbase_resolution * pow(per_level_scale, level) - 1.0
.If I'm not wrong, let's say we only consider a 2d grid volume with
resolution==4
, which can be visualized as: Then the cuda codes tell usconst float scale = 3.0f
, andconst uint32_t resolution = 4
. And the following cuda codes try to compute the above 2d grid indices (0<=i<=3), and normalized relative coordinates (within -0.5 ~ +0.5) for each query point given by input pointer. More concretely, thepos
variable for the first bin, normalized coordinate should range within[0.5, 1)
, and the two middle coordinates range within(0.0, 1.0)
, and the last one is within(0.0, 0.5]
. Note that,floor(x + 0.5)
is actuallyround(x)
.And I also found that in their latest
tiny-cuda-nn
implementation, he usedstride *= grid_resolution
instead of the onestride *= (resolution + 1)
used in your codes.So, my question is, since that the maximum number of point locations of each layer can be computed as
resolution ** input_dim
, instead of the(resolution + 1) ** input_dim
, maybe it's possible and better to shrink the allocated memory size as I suggest?