Open louis030195 opened 1 year ago
I believe using external projects like GGML aka llama.cpp
, apple silicon is supported for StableLM models.
The pytorch nightly releases have support for int64
cumsum ops (on macOS 13.3+), I've managed to get the sample code working by installing:
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
and changing the device to mps
instead of cuda
. The outputs are mostly nonsensical though, I assume that's due to the an issue in the MPS backend seeing as the CUDA implementation works ok (or at least the sample app on huggingface does).
Would be nice to support MPS to use this model on consumer hardware, it would be super useful, for example with Apple Shortcuts + raycast etc. i already have a bunch of gpt4 shortcuts which i would be happy to try with a non-privacy leaking/faster model
Since there is no code available, I cannot point out where is the fix for this