some confusion - Githubissues

boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

https://boheumd.github.io/MA-LMM/

MIT License

247 stars 27 forks source link

some confusion #12

Closed xjtulien closed 5 months ago

xjtulien commented 6 months ago

In the Blip2VicunaInstruct_MALMM class, the 235 and 237 lines of code did not calculate the image embeds in advance. Did you miss it by mistake?

xjtulien commented 6 months ago

Another question:I didn't understand how query memory bank is reflected in the code. I noticed that there is the apply_memory_bank function, but query_memory_bank is not defined in the code like self.visual_memory_bank. Is it possible to achieve memory bank-like results just through the attention layer?Thank you very much if you can take the time to answer my questions.

boheumd commented 6 months ago

Thanks for pointing out this bug. I fixed this error and updated it in the latest commit. For the query memory bank, you can check the detailed code here https://github.com/boheumd/MA-LMM/blob/main/lavis/models/blip2_models/blip2.py#L166