ch3cook-fdu / Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
MIT License
84 stars 6 forks source link

Thanks for your great work! I have some question #13

Open Leon1207 opened 7 months ago

Leon1207 commented 7 months ago

Dear authors. I have some questions about the lightweight caption head you proposed! How does the lightweight caption head differ from existing captioning models in terms of architecture and computational efficiency so that it's a "lightweight design"? Hope for your reply.

ch3cook-fdu commented 7 months ago

Nowadays, researchers are using large language models for image captioning. We identify our caption head as a "light-weight" design to support the possibility of set-to-set training.

Leon1207 commented 7 months ago

Thank you very much for your reply! Your explanation makes perfect sense! On the other hand, if methods like 3DJCG or D3Net don't use large models, are we lightweight enough?

ch3cook-fdu commented 7 months ago

As long as they contain a small amount of parameters, you can also call them "light-weight".

Leon1207 commented 7 months ago

Thanks!