facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.
MIT License
976 stars 60 forks source link

Train on 360-degrees video? #19

Open neronicolo opened 1 year ago

neronicolo commented 1 year ago

Hi,

Thanks for your work. Can we use equirectangular images as a dataset?

ameuleman commented 1 year ago

Hi, Yes, but without depth and flow losses for now. From initial tests, it also seems to benefit from a higher translation learning rate and skipping frames (if the video is slow-paced):

python localTensoRF/train.py --datadir ${SCENE_DIR} --logdir ${LOG_DIR} --fov 360 --lr_t_init 0.001 --frame_step 4 --loss_depth_weight_inital 0 --loss_flow_weight_inital 0

Please let me know how it goes.

neronicolo commented 1 year ago

Amazing, thanks!

neronicolo commented 1 year ago

Hi, the results could be better. The camera path looks off. It should be a straight line since it's a straight street, but it looks like a winding road.

ameuleman commented 1 year ago

Hi, Initial experiments on 360 videos seemed to work well. Would you mind sharing the video or a frame? A potential issue that comes to mind is that we often get dynamic elements in 360 videos that require masking: we do not handle dynamic objects.

neronicolo commented 1 year ago

Hi, Sure, here is the link. I've uploaded original video, synthesized video, and camera pose video. python localTensoRF/train.py --datadir ${SCENE_DIR} --logdir ${LOG_DIR} --fov 360 --lr_t_init 0.001 --frame_step 10 --loss_depth_weight_inital 0 --loss_flow_weight_inital 0. There are no dynamic objects. Thanks!

ameuleman commented 1 year ago

Hi, Thanks. The car is a dynamic element that needs to be masked out. Since it covers a large portion of the frame, it hurts pose estimation severely. Luckily, it is always at the same location in the image, which will make masking easy. Putting the following image in ${SCENE_DIR}/masks should improve results. all

neronicolo commented 1 year ago

Hi Andreas, I cropped images from the bottom before I started training. If you look at another video I uploaded you will see that car is not visible in the synthesized video. Thanks for the mask tip.

ameuleman commented 1 year ago

Hi, Cropping the image breaks the model as we are expecting full equirectangular images.