Closed CeZh closed 6 months ago
To better describe my question, I attached a figure here. Moreover, since we can train with batch size larger than one, how to deal with instance bank update when the number of frames per scene is different within the same batch (say the batch size is 2 and scene_1 has 16 frames, scene_2 has 20 frames). Thanks!
There is no need to reset, we use time_interval to determine whether to use the temporal feature. Because in the same batch, some samples may be the first frame, and some may not be. https://github.com/HorizonRobotics/Sparse4D/blob/main/projects/mmdet3d_plugin/models/instance_bank.py#L171-174
I see... Thank you so much for your quick response. Another question but not related to the topic of my main thread. For the Deformable Aggregation Function (DAF), the v3 is only doing single frame DAF, am I right? What I mean "single frame" here is that the input to the DAF is multi-scale multi-view camera image features
within the same frame instead of multi-temporal
camera image features.
Yes, you are right.
Thanks a lot for your quick response! It is really helpful.
Thank you for open-source this code. This is really a nice work. Let's say we are training with just two frames (Frame0 and Frame1), my understanding with the
instance_bank
is that you store the Frame0's detection features into thisinstance_bank
class and update partial of Frame1's detection features with the stored Frame0 features to enhance temporal feature. May I know when do you "reset" the instance bank during training after going through a full scene and start to training on a new scene? Thanks!