Add support for tool calling

This PR enhances our system by adding support for tool calling in accordance with the OpenAI API specification.

Supported Models:

[x] Llama 3.1
[x] Arcee Agent
[x] Firefunction
[x] xLAM
[x] C4AI-Command-R-Plus

Supported Modes:

[ ] Streaming
[x] Without Streaming
[x] Parallel Tool Calling
[ ] Tool choice

API Example:

Here's a sample API request demonstrating the new tool-calling capabilities:

curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "mlx-community/Meta-Llama-3.1-8B-Instruct-8bit",
  "messages": [
    {
      "role": "user",
      "content": "What s the weather like in San Francisco and Washington?"
    }
  ],
  "tools": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "format": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "description": "The temperature unit to use. Infer this from the users location."
          }
        },
        "required": ["location", "format"]
      }
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false,
  "parallel_tool_calling": false
}'

This example illustrates how to request weather information for San Francisco and Washington, using the specified model and tool.

Key Enhancements:

Model Diversity: Support for a range of models ensures compatibility with various applications and user needs.
Flexible Modes: Users can choose between streaming, non-streaming, and parallel tool calling modes to optimize performance and response times.
Detailed Tool Integration: The ability to define tool parameters and descriptions allows for precise and effective tool usage.

Blaizzy / fastmlx

Add support for tool calling #21

Supported Models:

Supported Modes:

API Example:

Key Enhancements: