X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Apache License 2.0
213 stars 17 forks source link

Localizing positions of objects in a scene #13

Open rose-jinyang opened 9 months ago

rose-jinyang commented 9 months ago

Hello How are you? Thanks for contributing to this project. In general, there is NOT ONLY one object in a scene. So if there are multiple objects in a scene and actions of the objects (ex: person) are different, we need to localize the object's position. Is it possible to localize positions of all the objects for one video caption? If it is impossible for right now, do u know any solution or method for this purpose?