Closed s-smits closed 2 months ago
Thanks a lot for adding it here :)
Quantisation already works. Once the model script is ready you can just run
python -m mlx_vlm.convert \
--hf-path xtuner/llava-phi-3-mini-hf \
-q
--upload-repo mlx-community/llava-phi-3-mini-hf-4bit
Are you interested and adding a PR for this model?
I can try, quite new to the MLX space so this will take a while.
https://github.com/Blaizzy/mlx-vlm/pull/12 that was fast! 🚀
You can use the already pre-quantized model in the hub:
https://huggingface.co/mlx-community/llava-phi-3-mini-4bit
https://huggingface.co/mlx-community/llava-llama-3-8b-v1_1-4bit
Just install the latest version:
pip install -U mlx-vlm
Great, thank you!
Most welcome!
See title. A q4 version would be great as well. https://huggingface.co/xtuner/llava-phi-3-mini-hf