Open duncancalvert opened 1 month ago
Recommended vision language models/approaches: -Benchmarks:
Vision understanding evals - GPT-4o achieves state-of-the-art performance on visual perception benchmarks. All vision evals are 0-shot, with MMMU, MathVista, and ChartQA as 0-shot CoT. Reference: https://openai.com/index/hello-gpt-4o/
Closed Architecture:
Open Source
Papers to review: