cambridgeltl / visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Apache License 2.0
90 stars 7 forks source link