Open carlosgjs opened 9 months ago
Using the HF pipeline with the bitsandbytes quantization doesn't work on MPS yet. However, the llama.cpp runtime works well on a Mac, so that can be leveraged. We need to dynamically load/use a runtime based on the platform.
Using the HF pipeline with the bitsandbytes quantization doesn't work on MPS yet. However, the llama.cpp runtime works well on a Mac, so that can be leveraged. We need to dynamically load/use a runtime based on the platform.