Add support for token streaming, parallel jobs and custom CORS

arcee-ai / fastmlx

FastMLX is a high performance production ready API to host MLX models.

Other

222 stars 25 forks source link

Closed Blaizzy closed 4 months ago

Blaizzy commented 4 months ago

This PR adds:

Multi-modal token streaming.
Support for Parallel calls (single and multiple models) by default upto N workers.
Supported model type endpoint.
Delete model endpoint.
Custome CORS.

Todo:

Closes #2, Closes #5