Closed Tao-11-chen closed 2 days ago
Hi @Tao-11-chen, we always sample 5 frames for training. The window size of the curriculum training is the maximum sampled frame interval. You can check this function for: https://github.com/HengyiWang/spann3r/blob/5f2c293130fc1476d80a7deab842bada7eecc096/spann3r/datasets/base_many_view_dataset.py#L9
Thanks for your reply!
Hello, thanks for sharing your amazing work. I find myself hard to understand this part of the paper:
Due to GPU memory constraints, we train our model by randomly sampling 5 frames per video sequence. Thus, the memory bank contains only a 4-frame memory at maximum during training. To ensure the model adapts to diverse camera motions and long-term feature matching, we gradually increase the sample window size throughout the training. For the last 25% epochs, we gradually decrease the window size to ensure the training frame interval aligns with the inference frame interval.
Does it mean you are sampling five frames at the beginning, increasing the sampling number to the whole sequence, and then decreasing the number for the last 25% epochs? Or you are adjusting the kf_every? Which one is better do you think?
Thanks in advance if you could help me.