Open crepejung00 opened 1 month ago
Hi @crepejung00, yes, your understanding is correct. For forgetting, it is more like "error accumulation" as we do not have a module that refines memory features when including more observations. Also, since our training uses 5-frame sequence due to the GPU memory constraints, this limits the size of the spatial region the current version of Spann3R can deal with. (Please refer to Sec 4.4 for detailed discussions of the current limitations).
Hi, thanks for the great work and the quick release of the codes! I have a question regarding the memory module used in Spann3r. I have noticed that you use a similar approach to XMem originally designed for Video Object Segmentation, which aims to store frequently used prototypes in the long-term memory. From this memory module, I have understood that the long-term memory stored in spann3r would likely store geometrical prototypes that are frequently used or essential to correspond to other views, successfully decoding the 3D points in one global coordinate. I have two questions, where 1. Is my current understanding correct? and 2. Have you noticed any forgetting issues when handling more frames such as n> 300 or n > 1000?
Thanks!