HengyiWang / spann3r

3D Reconstruction with Spatial Memory
https://hengyiwang.github.io/projects/spanner
481 stars 19 forks source link

Question on curriculum training #13

Closed Tao-11-chen closed 2 days ago

Tao-11-chen commented 2 days ago

Hello, thanks for sharing your amazing work. I find myself hard to understand this part of the paper:

Due to GPU memory constraints, we train our model by randomly sampling 5 frames per video sequence. Thus, the memory bank contains only a 4-frame memory at maximum during training. To ensure the model adapts to diverse camera motions and long-term feature matching, we gradually increase the sample window size throughout the training. For the last 25% epochs, we gradually decrease the window size to ensure the training frame interval aligns with the inference frame interval.

Does it mean you are sampling five frames at the beginning, increasing the sampling number to the whole sequence, and then decreasing the number for the last 25% epochs? Or you are adjusting the kf_every? Which one is better do you think?

Thanks in advance if you could help me.

HengyiWang commented 2 days ago

Hi @Tao-11-chen, we always sample 5 frames for training. The window size of the curriculum training is the maximum sampled frame interval. You can check this function for: https://github.com/HengyiWang/spann3r/blob/5f2c293130fc1476d80a7deab842bada7eecc096/spann3r/datasets/base_many_view_dataset.py#L9

Tao-11-chen commented 2 days ago

Thanks for your reply!