YouHuang67 / InterFormer

MIT License
35 stars 5 forks source link

Training Flow #4

Closed che1007 closed 10 months ago

che1007 commented 10 months ago

image In SimpleClick, each image undergoes 1 to 3 iterations. During the iteration for one image, previous outputs and new coordinate features are processed through the entire model each time. In your architecture, previous outputs and new coordinate features are fed into 'Feature Decoding.' Therefore, I believe that during the iteration for one image, the image features from 'Feature Encoding' are reused. Is my understanding correct?

YouHuang67 commented 10 months ago

Yes, you are correct. In the InterFormer architecture, each image undergoes 'Feature Encoding' just once. This encoded feature is repeatedly used for multiple interactions to segment any object within the image. Consequently, the reuse rate of the image features is high, particularly when annotating multiple objects in a single image, leading to significantly enhanced processing efficiency.

Thank you for your interest in our work.