train loss converged, but val loss do not

chenhsuanlin / bundle-adjusting-NeRF

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

MIT License

793 stars 114 forks source link

train loss converged, but val loss do not #44

Closed ylhua closed 2 years ago

ylhua commented 2 years ago

I use Nuscenes data, a auto-driving dataset which have camera to world transform matrix, to train the barf. And I normalize the translation matrix between 1 to 10. I used tensorboard to visualize the training process and found the train loss converged, but the val loss went up. Do you have some ideas about the reason for this? Screenshot from 2022-06-20 18-24-10

chenhsuanlin commented 2 years ago

Hi @huaahuaa, please see #23 as it is likely the same issue. Can you check if you cloned from the latest main branch?

ylhua commented 2 years ago

Hi @huaahuaa, please see #23 as it is likely the same issue. Can you check if you cloned from the latest main branch?

Thanks for your timely reply. My code is the newest version. I wonder if barf can fit the outdoor scene. As I mentioned before, I normalized the pose matrix, cause the outdoor auto-driving datasets has large translation matrix. I have used it directly and set depth range between 1 to 60, but train loss stayed still and the output of Barf just kept 1. After normalizing the pose matrix, I have the result as I showed before. So is barf suitable for unbounded outdoor scene and does normalizing pose hurt the performance of Barf?

chenhsuanlin commented 2 years ago

I haven't tested BARF on unbounded scenes. I would imagine that having to set a large depth range would make optimization much more difficult, as it would be harder to represent the entire (unbounded) scene. Something like NDC reparametrization could help, but unfortunately I don't have a confident answer for this at the moment.

ylhua commented 2 years ago

Thanks! It helps a lot. I will have a try on ndc space.

ylhua commented 2 years ago

I have a try in NDC space, but the loss was nan and just stop at this position https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/803291bd0ee91c7c13fb5cc42195383c5ade7d15/model/nerf.py#L235. And I also tried to shrink the scale of train data. Through the visualizing of the train process, I think the net has learned something as for the NVS mission, but the pose optimising was not. Still traing loss was decreasing, but val loss not Screenshot from 2022-06-27 14-49-28 Screenshot from 2022-06-27 15-05-12 .

ylhua commented 2 years ago

I have done the same operation for camera pose as https://github.com/chenhsuanlin/bundle-adjusting-NeRF/issues/46#issuecomment-1163894872

chenhsuanlin commented 2 years ago

Unfortunately I don't have a good idea what the cause would be. I haven't tried BARF with NDC before, would be interesting to try but there might also be unanticipated issues. I'm also not familiar with the nuScenes dataset; dynamic objects / illumination changes / etc may also break the optimization. I'm also not sure why ray still becomes NaN (I suppose it has something to do with NDC) -- if you happen to have found the root cause, I'm happy to review a PR for it.

chenhsuanlin commented 2 years ago

Closing due to inactivity, please feel free to reopen if necessary.