HengyiWang / spann3r

3D Reconstruction with Spatial Memory
https://hengyiwang.github.io/projects/spanner
720 stars 31 forks source link

Question about Training #33

Open resurgo97 opened 1 week ago

resurgo97 commented 1 week ago

In the paper, you mentioned that "Due to GPU memory constraints, we train our model by randomly sampling 5 frames per video sequence."

From my understanding though, the number of frames should not be much of a bottleneck with current architecture and memory design.

For example, when 6th frame is given, the previous hidden states are supposed to be fixed (and have been updated), as they do not depend on the future image frames.

So technically saying, I think we could infinitely extend the sequence length during training.

Is my understanding correct?

HengyiWang commented 1 week ago

Hi @resurgo97, in training, you need gradient backpropagation and the number of memory tokens increases over time, so you may not be able to infinitely extend the sequence length.