借楼提问有关LLaViLo的问题

KaijingOfficial / sram_vtg

source code of sram

MIT License

8 stars 0 forks source link

借楼提问有关LLaViLo的问题 #1

Closed Yikai1Wang closed 1 month ago

Yikai1Wang commented 1 month ago

您好，我之前关注到您的LLaViLo: Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling这篇文章，其中我想请问Additional anchor tokens指的是什么，可以参照您的实现代码吗？

KaijingOfficial commented 1 month ago

Hi, the code for LLaViLo is currently not publicly available. Regarding anchor tokens, they serve the same purpose as the input query tokens of the DETR decoder. You only need to concatenate them with your language instruction, query, or video embedding as the input for the LLM, and then perform the moment localization loss on those tokens at the output.