Closed yangbinchao closed 2 years ago
Hi, Everything happens the same, the code is supposedly flexible to the nimber of frames in a snippet. The thumb rule is that the lower the number of frames, the easier the algorithm will converge. This is because the displacement of pixels between frames is very small, and thus the photometric loss is always meaningful. However, it will make both networks less precise because parallax will also be very small. And finally, there are diminishing returns in adding more frames in a snippet, because at some point the pixel displacement is so high that a high of pixels are not seen on both frames. There were some test with 7-frames snippets on KITTI but it was only marginally bette than 5 frames and much slower.
As such, it all depends on your dataset, what displacement there is between two frames, what parallax you usually get between two frames, but generally when your training is stable enough, you should try higher values, until you see no improvement.
Thank you for your excellent work. Recently I was thinking about the pose estimation in the paper using 5 frames for training and testing. What happens if I use 3 or more frames? I look forward to your detailed answer to my confusion, thanks!