arijitray1993 / COLA

COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
MIT License
22 stars 0 forks source link