X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Apache License 2.0
220 stars 18 forks source link

Localizing positions of objects in a scene #13

Open rose-jinyang opened 1 year ago

rose-jinyang commented 1 year ago

Hello How are you? Thanks for contributing to this project. In general, there is NOT ONLY one object in a scene. So if there are multiple objects in a scene and actions of the objects (ex: person) are different, we need to localize the object's position. Is it possible to localize positions of all the objects for one video caption? If it is impossible for right now, do u know any solution or method for this purpose?