[Inference API] Add unified api for chat completions

maxhniebergall commented 2 days ago

A POC for the unified API communicating with OpenAI

Testing

Running ES

The _unified route is behind a feature flag, so to enable it run es like this:

./gradlew :run -Drun.license_type=trial -Des.inference_unified_feature_flag_enabled=true

Creating endpoint and sending requestions

Creating a completion endpoint

PUT http://localhost:9200/_inference/completion/test
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "gpt-4o"
    }
}

Completion request

POST http://localhost:9200/_inference/completion/test/_unified
{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Boston today?"
        }
    ],
    "stop": "none",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Response format

Implementing the response format is still in progress

elasticsearchmachine commented 2 days ago

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine commented 2 days ago

Hi @maxhniebergall, I've created a changelog YAML for you.

maxhniebergall commented 2 days ago

Tests to add:

UnifiedCompletionAction
TestStreamingCompletionServiceExtension
Rolling update tests
TransportInferenceActionTests
InferenceInputs (@jonathan-buttner)
- we should double check the castTo method and add it to the other subclasses
OpenAiRequestManager?
UnifiedChatInput
- for the conversions
OpenAiUnifiedCompletionRequestEntity
- Max has already started on this
BaseInferenceAction
RestUnifiedCompletionInferenceAction
OpenAiService
OpenAiChatCompletionModel

Address outstanding TODOs

elastic / elasticsearch