You use mlx-lm to make an HTTP API for generating text with any supported model. The HTTP API is intended to be similar to the OpenAI chat API
It is very similar (if not the same as Ollama) however, the code completion bit is not working properly as it seems not to be able to get the stop token right
Describe the need of your request
You use mlx-lm to make an HTTP API for generating text with any supported model. The HTTP API is intended to be similar to the OpenAI chat API
It is very similar (if not the same as Ollama) however, the code completion bit is not working properly as it seems not to be able to get the stop token right
Proposed solution
No response
Additional context
More info:
https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/SERVER.md