clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19 stars 26 forks source link

Add model OpenAI gpt-4-vision-preview #89

Closed yanweiser closed 1 month ago

yanweiser commented 2 months ago

I have extended the OpenAI backend and model_registry to support OpenAI's multimodal version of gpt-4. This also needed a small change to the way messages are added by allowing users to pass images (file path or link) to the add_message function via the optional keywork 'image'. I am using keyword arguments for this to make it easily extendable to other modalities (video, audio ...) and combinations of them. While most open models only support one image per user message, gpt-4v also takes multiple images in a single message. Hence the value of the 'image' argument needs to be a string or a list of strings. Since the format in which messages are passed to the model differes from other OpenAI models, I added a new value to the model_registry which indicated if this other messages format needs to be aplied.

phisad commented 1 month ago

Hi @yanweiser did you close this because of the other PR?

yanweiser commented 1 month ago

Oh yeah, I should have let you know. I talked to Sherzod yesterday and he has implemented a backend for multimodal model for gpt-4-vision and the new gpt-4o as well as the anthropic claude model. Instead of adding one model backend at a time he will just add them all later on.