Time codes implementation and exploding gradients

Hi,

I am trying to reproduce this paper and I have encountered some problems with the time codes. Even though the paper is written quite clearly, I find I am missing some details to be able to reproduce it. For example, as asked in #4 , I am not sure how to input these time codes into the network. You said they are "concatenated to the 5D input, and fed into the mlp network", but that is not really the input to the network layers. For example, in NeRF the 3D points x go through 8 FC layers before the 2D direction d is concatenated to the output of that last layer and then processed with more FC layers. So is the time code something that goes through every layer of the MLP? Does it also use the skip layer?

What I have done for now is to just concatenate the time_codes to the points x (after positional encoding) and then process them as I would process x. To be able to do that, I need to expand the time_code for the given time of size [1,D] to the shape of the points tensor, which is [N, N_rays, N_samples, P]. But I am finding this is producing exploding gradients that I haven't managed to solve. I am following the instructions in the paper. The dimension of the time_codes D=1024, which is a latent code initialised to a normal of mean 0 and std 0.01/sqrt(1024). The learning rate is x10 the learning rate of the network parameters.

1- Did you encounter any exploding gradients or have any suggestions about what I could be doing wrong? 2- Is the size of the time codes 1024 not overwhelming the weight of the points after positional encoding which just have a dimension of 60? (When using the PE length suggested in NeRF of L=10) 3- Are the values of the time_codes not extremely small? How does that affect the overall performance?

Thank you so much for your time. Looking forward to being able to work with your model :-)

facebookresearch / Neural_3D_Video

Time codes implementation and exploding gradients #21