EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.37k stars 243 forks source link

Pre-built binary for macOS Silicon does not seem to use Metal / GPU #629

Open ChristianWeyer opened 1 month ago

ChristianWeyer commented 1 month ago

Describe the bug

Download https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.2/mistralrs-server-aarch64-apple-darwin.tar.xz

Use a tool like asitop to see GPU usage.

Run e.g. mistralrs-server -i plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama

Chat with the model.

No GPU usage can be seen in asitop.

Latest commit or version

mistralrs-server -i plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama

ChristianWeyer commented 1 month ago

... and when trying to cargo build --release --features metal

I get an error:

Compiling mistralrs-bench v0.2.2 (/Users/christianweyer/Sources/mistral.rs/mistralrs-bench)
error[E0107]: enum takes 2 generic arguments but 1 generic argument was supplied
   --> mistralrs-pyo3/src/lib.rs:50:20
    |
50  | fn get_device() -> Result<Device> {
    |                    ^^^^^^ ------ supplied 1 generic argument
    |                    |
    |                    expected 2 generic arguments
    |
note: enum defined here, with 2 generic parameters: `T`, `E`
   --> /Users/christianweyer/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:502:10
    |
502 | pub enum Result<T, E> {
    |          ^^^^^^ -  -
help: add missing generic argument
    |
50  | fn get_device() -> Result<Device, E> {
    |                                 +++

For more information about this error, try `rustc --explain E0107`.
error: could not compile `mistralrs-pyo3` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
ChristianWeyer commented 1 month ago

Interestingly, this works: cargo run --release --features metal -- --port 1234 plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama

and uses the GPU.

EricLBuehler commented 1 month ago

@ChristianWeyer the pre-built binaries only work on the CPU.

ChristianWeyer commented 1 month ago

What is the reason for this?

EricLBuehler commented 1 month ago

We use cargo dist to generate the binaries in CI, but it'd take an extremely long time to build binaries for the whole matrix of the accelerators and platforms. Besides that, we would also need to use something like docker to build for CUDA/Metal, because Github CI doesn't.