Closed pure-water closed 2 years ago
Hi there, the grid encoding is part of the tiny-cuda-nn framework; you can find it here: https://github.com/NVlabs/tiny-cuda-nn/blob/master/include/tiny-cuda-nn/encodings/grid.h
It can be configured with dense, tiled, and hash-based memory, the latter being the encoding described in the paper.
Hi thanks, actually find it already in the stand-alone TCNN repo.
[EDIT] This "hash" seems to be compressing the traditional spatial encoding a bit. This is quite un-intution but apparently it works! What a magic!
Edit since I think I probably need be a bit more specific. Is this my understanding correct that the "encoded feature vectors" are just a payload to the standard <x,y,z,thi,phi> in this context?
They are a replacement to x, y, z
. Theta/phi (or another parameterization of direction) are fed separately in the NeRF setting (controlled by the dir_encoding
entry of the network config .json).
Sorry, my bad wording. I am aware that is a bit of specific of nerf details whereby the color and density are seperated. Let me reclariy my question on this context, let us focus on density only first( no noise/confusion of the theta/phi anyway). I mean from the paper the hash-grid-encoding output are cascaded with <x,y,z>. You are saying it is a replacement of <x,y,z>? (Figure 3,
Regards Yao
The encoding is not concatenated (do you mean this with "cascaded" or something else?) with <x,y,z>. It really is used as a replacement to inputing <x,y,z> to the network. Plus optional extra dimensions \xi, which, to be fair, could include <x,y,z> if you'd like. But we don't do that in any of the paper results.
For neural radiance caching, \xi is defined as <theta,phi,...> indeed, but the "..." are material parameters rather than <x,y,z>.
Could you point me to the place in the paper that suggests we concatenate with <x,y,z>? We may need to clarify that.
Regarding to the Nerf case (It seems I can not attch image on this forum?).
It is here https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf
Page 4, Figure 3 , part (c). It appears that "E" here actually means "x,y,z" in the Nerf Context?
Otherwise, "E" probalby means anyting other than "x,y,z", just something not encoded with hash-grid coding if that is what it really meant?
Regards Yao
As I look into the codes, the feature vector ((4) in Figure 3), in NeRF task, is a concatentation of two embeddings, each of which is extracted from the position hash grid(queried by x, y, z) and the direction hash grid(by theta, phi). Then, the former goes into the density network and its output is concatenated with the latter to query rgb network.
You can see that implemented in the function inference_mixed_precision
in nerf_network.h
.
EDIT: As @mmalex said, the 2D direction is not encoded in a hash-grid way, but as spherical harmonics.
This seems to be related to "dir_encoding"
in base.json
, of which the default value is "SphericalHarmonics".
In the first place, it seems not legitimate to encode direction with hash grid which is only spatially defined,
that's why I think they are using other forms of encoding here.
Thanks. If we simply focus on the density network, yup, What Thomas said is aligned with the code as pointed out by resurgo97. So I think the "auxilary input" in the paper doesn't apply for the Nerf-Case?
Regards Yao
Right. I also think the paper could be a little more clear on that.
Hi! thanks for the feedback about clarity in the paper! You are right, in the NeRF case we feed the output of the hashgrid encoding directly to a first MLP without any additional 'auxiliary inputs' to predict a density that is not view-dependent, and then 'chain' the first 16 neurons of hidden state from that MLP along with an (spherical harmonic or one blob) encoded view direction into the 'rgb network' to predict color. We also implemented nerf previously using the simpler approach of concatenating the view direction as an aux input alongside the hashgrid encoded position, and using a single MLP; this also works fine, but leads to a view dependent density. When we came to make comparisons to other nerf techniques, we decided instead to use the more complicated setup since it is closer to other nerfs. But both methods work fine, and the hashgrid encoding itself can easily be used with or without auxiliary inputs, depending on the needs of the particular application.
On Fri, Feb 11, 2022 at 4:14 PM resurgo97 @.***> wrote:
Right. I also think the paper could be a little more clear on that.
— Reply to this email directly, view it on GitHub https://github.com/NVlabs/instant-ngp/issues/112#issuecomment-1036376326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFOYTYIDS7NQGXPPNZDG5TU2UYXPANCNFSM5NQ2HMRQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
All right. Thanks mmalex for clear clarification which is helpful. Essentially just as a hypothetical example, if we do not care about the density at all,only the final <R,G,B> color ( I understand the color is depended on density in principle). But in the applicaton case, if we only need <R,G,B> we should be able to put <x,y,z,thea,phi> together as the input of the hash-grid and connect it to the rgb-network?
Regards Yao
Hi Is there a chance to give a quick note on where the meat is regarding to the "hash" part? I looked into the code, the nerf in TCNN seems to be a vanilla Nerf?
Regards Yao