UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
580 stars 102 forks source link

[Feature request] Expose private elements #678

Open rusheb opened 2 days ago

rusheb commented 2 days ago

There are a number of private elements we have been using. It would be great if these could be exposed in the public API.

Low-level tool API for defining dynamic tools:

from inspect_ai.model._call_tools import ToolDef, tool_def

Looking up agents from the registry in order to select agents via CLI:

from inspect_ai._util.registry import registry_lookup
from inspect_ai._util.registry import registry_name, registry_unqualified_name

Defining a custom provider for o1:

from inspect_ai.model._model_call import ModelCall
from inspect_ai.model._providers.openai import OpenAIAPI
from inspect_ai.model._providers.util import (
    ChatAPIHandler,
    ChatAPIMessage,
    as_stop_reason,
    chat_api_input,
)

Some of these we could probably just duplicate in our own codebase. Others that would not work for in the case where we are interacting with inspect components (e.g. ToolDef).

Let us know which ones you think make sense to make public.

jjallaire-aisi commented 2 days ago

Could you provide a short code snippet demonstrating how you use ToolDef? It may be that we should just export this but perhaps seeing your example will give me another idea for making the scenario work.

On the registry, we do have a --solver CLI option and solver = option for eval() that is intended to do this already. Note that these take a SolverSpec which can just be a simple string name (and the registry lookup / create is done). Dynamically binding to agents by registry. name should be a first class feature so if it can't be done flexibly enough w/ what we have I'd be keen to adapt it. I don't (yet) want to make the registry functions public b/c I think we might want a higher level / better type checked version of them.

On the last one re: custom chat providers I agree that it's a hole that we don't export these as they are used pretty frequently especially in our REST API based custom providers. The reason they aren't exported is that I don't think they are terribly well thought through (i.e. their a little janky and incomplete) and I don't want to commit to them long term in this form. In this case I'd recommend just doing the private imports for now until we can set aside time to make create a proper "toolkit" for people authoring custom model proviers.