Blaizzy / fastmlx

FastMLX is a high performance production ready API to host MLX models.
Other
159 stars 12 forks source link

Add support for tool calling #21

Closed Blaizzy closed 1 month ago

Blaizzy commented 1 month ago

This PR enhances our system by adding support for tool calling in accordance with the OpenAI API specification.

Supported Models:

Supported Modes:

API Example:

Here's a sample API request demonstrating the new tool-calling capabilities:

curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "mlx-community/Meta-Llama-3.1-8B-Instruct-8bit",
  "messages": [
    {
      "role": "user",
      "content": "What s the weather like in San Francisco and Washington?"
    }
  ],
  "tools": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "format": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "The temperature unit to use. Infer this from the users location."
          }
        },
        "required": ["location", "format"]
      }
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false,
  "parallel_tool_calling": false
}'

This example illustrates how to request weather information for San Francisco and Washington, using the specified model and tool.

Key Enhancements:

  1. Model Diversity: Support for a range of models ensures compatibility with various applications and user needs.
  2. Flexible Modes: Users can choose between streaming, non-streaming, and parallel tool calling modes to optimize performance and response times.
  3. Detailed Tool Integration: The ability to define tool parameters and descriptions allows for precise and effective tool usage.