Blaizzy / fastmlx

FastMLX is a high performance production ready API to host MLX models.
Other
159 stars 12 forks source link

FastMLX Python Client #23

Open Blaizzy opened 1 month ago

Blaizzy commented 1 month ago

Feature Description

Implement a FastMLX client that allows users to specify custom server settings, including base URL, port, and number of workers. This feature will provide greater flexibility for users who want to run the FastMLX server with specific configurations.

Proposed Implementation

  1. Modify the FastMLX class constructor to accept additional parameters:

  2. Update the FastMLXClient class to:

    • Parse the base_url to extract host and port
    • Store the workers parameter
    • Use these values when starting the server
  3. Modify the start_fastmlx_server function to accept host, port, and workers as parameters.

  4. Update the ensure_server_running method in FastMLXClient to use the custom settings when starting the server.

Example Usage

from fastmlx import FastMLX

client = FastMLX(
    api_key="your-api-key",
    base_url="http://localhost:8080",  # Custom port
    workers=4  # Custom number of workers
)

# Use the client...

client.close()

# Or use as a context manager
with FastMLX(api_key="your-api-key", base_url="http://localhost:8080", workers=4) as client:
    # Your code here
    pass

Benefits

Potential Challenges

Tasks

Questions

Please provide any feedback or suggestions on this proposed implementation.

stewartugelow commented 4 weeks ago

Q: Why do you need a standalone client? Couldn't you set all of these variables by API?

Blaizzy commented 3 weeks ago

Yes, you can set the variables.

But this would help if you want to programmatically start and stop the server.

Imagine like the OpenAI/Anthrophic Python Client

stewartugelow commented 3 weeks ago

I ask out of complete ignorance, but would one of the following approaches from ChatGPT work?


Using os.subprocess (or more commonly subprocess module) in Python to start and stop a FastAPI server programmatically can work, but there are a few considerations, trade-offs, and potentially better alternatives. Here's an overview of what to keep in mind:

Using subprocess to Start/Stop a FastAPI Server

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. Using subprocess to start a FastAPI server typically involves launching the server in a separate process, and you can control it (e.g., stop it) by managing the process.

Example of starting a FastAPI server with subprocess:

import subprocess

# Start the FastAPI server
process = subprocess.Popen(["uvicorn", "app:app", "--host", "127.0.0.1", "--port", "8000"])

# Stop the FastAPI server
process.terminate()  # Graceful termination
process.kill()       # Force kill if necessary

Considerations:

  1. Process Management: When using subprocess, you're working with a separate process. Managing the lifecycle of this process requires careful handling, especially around shutdown, cleanup, and ensuring that the process is terminated properly.

  2. Error Handling: You need to capture and handle any potential errors that arise from starting the subprocess. For example, the FastAPI server might fail to start due to port conflicts, missing dependencies, or invalid configurations.

  3. Blocking Behavior: The subprocess call may block your main thread depending on how you handle the process. If your main program needs to continue running, you’ll need to manage the subprocess asynchronously or in a separate thread.

  4. Cross-Platform Compatibility: If you plan to run your FastAPI server on different platforms (e.g., Windows, Linux), ensure that your subprocess code accounts for platform-specific behavior, such as differences in process termination or command-line syntax.

  5. Graceful Shutdown: Simply terminating the process with .terminate() or .kill() may not allow FastAPI to shut down gracefully, which can lead to problems like unsaved data, incomplete responses, or locked resources.

Alternatives to subprocess

There are other methods to start and stop a FastAPI server programmatically that may offer better control or integration with your Python application:

1. Direct Integration via uvicorn.run()

Instead of starting a new process, you can run the FastAPI app directly in your Python code using uvicorn.run(). This keeps everything in the same process and allows for tighter control and better integration with your existing application logic.

Example:

import uvicorn
from threading import Thread

def start_fastapi():
    uvicorn.run("app:app", host="127.0.0.1", port=8000)

# Start FastAPI server in a separate thread
thread = Thread(target=start_fastapi)
thread.start()

# Stop the server by exiting the program or controlling the thread

Advantages:

Disadvantages:

2. Using an Asyncio Event Loop

If your application uses asyncio, you can start and stop the FastAPI server within an asyncio event loop. This allows for non-blocking execution and more flexible server management.

Example:

import asyncio
import uvicorn

async def start_fastapi():
    config = uvicorn.Config("app:app", host="127.0.0.1", port=8000)
    server = uvicorn.Server(config)
    await server.serve()

async def stop_fastapi(server):
    await server.shutdown()

loop = asyncio.get_event_loop()
loop.create_task(start_fastapi())

Advantages:

3. Using Gunicorn with asyncio Workers

If you need to run the FastAPI app in production, you might consider using gunicorn with Uvicorn workers. You can control Gunicorn programmatically as well, although it adds complexity to manage the Gunicorn process.

Example:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker app:app

Gunicorn handles process management, and you can programmatically start and stop it by managing the Gunicorn process.

Advantages:

Disadvantages:

Conclusion

Using subprocess to start and stop a FastAPI server is a valid option, but it requires careful process management and error handling. Depending on your use case, integrating FastAPI directly using uvicorn.run() or within an asyncio event loop might be more efficient and offer tighter control.

Each method has its trade-offs, so choose the one that best fits your application's architecture and requirements.