results after matchit scoring update (commercial models)

clembench / clembench-runs

All outputs generated by running the benchmark on different versions

MIT License

0 stars 5 forks source link

results after matchit scoring update (commercial models) #41

Closed kushal-10 closed 3 weeks ago

kushal-10 commented 3 weeks ago

Add results for matchit pentomino instances for claude3, gemini1.5 flash, gpt 4o and gpt4 vision run transcribe + score updated result files and additional score files for the above 4 models