UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
627 stars 118 forks source link

`ContentImage.detail` is not passed down to the `OpenAIAPI`. #878

Closed tobiasraabe closed 2 hours ago

tobiasraabe commented 1 day ago

Hi everyone!

Thanks for the fantastic package! I am using it for all my evaluations, which helped me a lot.

I was trying to use inspect with images in my dataset and noticed that the ContentImage.detail parameter is not respected in the response generation.

To reproduce

Use the task from examples/images but set the detail parameter of the images to "low". Execute it with GPT4o.

Notice that the token consumption is much higher than the expected 85 tokens with low resolution. In the JSONs of each sample, the detail parameter is set to "auto".

Changes

Three changes need to be made.

First, the detail parameter is not passed on in line 78. The if isinstance(content.image, str): is always true. This is why mypy will never type check the else block and the type error content.image.detail is not detected. Notice the types become Never when you hover over them in vscode. Using warn_unreachable (link) detects these cases.

https://github.com/UKGovernmentBEIS/inspect_ai/blob/0f342f7af6c73b8127cf613e849f55b45f273696/src/inspect_ai/dataset/_sources/util.py#L76-L83

The same issue happens here again.

https://github.com/UKGovernmentBEIS/inspect_ai/blob/0f342f7af6c73b8127cf613e849f55b45f273696/src/inspect_ai/_eval/task/images.py#L96-L103

Thirdly, regardless of the detail parameter, line 422 will set it to auto.

https://github.com/UKGovernmentBEIS/inspect_ai/blob/0f342f7af6c73b8127cf613e849f55b45f273696/src/inspect_ai/model/_providers/openai.py#L419-L425

I can also push my changes and create a PR if my solution, removing the unnecessary if-else clauses in all three cases, is the preferred solution.

jjallaire commented 2 hours ago

Thanks so for reporting this! Fixed here https://github.com/UKGovernmentBEIS/inspect_ai/pull/886