NaN weights in the begining of training

google-research / multinerf

A Code Release for Mip-NeRF 360, Ref-NeRF, and RawNeRF

Apache License 2.0

3.58k stars 339 forks source link

NaN weights in the begining of training #22

Closed CSU-NXY closed 1 year ago

CSU-NXY commented 1 year ago

Hi, thanks for your great work. I noticed that the weights would be NaN after the first sampling, however the training pipeline does not broken and the PSNR is growing. I'm wondering why this happened and how do you deal with NaN weights?

jonbarron commented 1 year ago

That's pretty weird, and shouldn't happen. What are you running? Did you modify the code?

CSU-NXY commented 1 year ago

I'm running the pinecone dataset using train_360.sh. The only changes I made are setting config.factor to 8 and config.batch_size to 1024 in 360.gin.

I'm wondering whether the NaN comes from the sanity checking step in Jax, because the output looks good after the very first forward step.