Closed fakerybakery closed 7 months ago
I don't think metal is supported, (yet).
@QueryType Thanks! I see Metal is on the roadmap.
Yes, it is, but low priority. Currently when I tried on mac, it started to saturate the CPUs and trigger swap. I am trying to find the best combination of parameters to avoid that. GPU support would indeed be welcome.
Yes, it is, but low priority. Currently when I tried on mac, it started to saturate the CPUs and trigger swap. I am trying to find the best combination of parameters to avoid that. GPU support would indeed be welcome.
Totally rookie here, this is my command for using GPU on M1 Max 64GB, no swap was used:
./main -t 10 -ngl 32 -m ./models/airoboros-65B-gpt4-1.2.ggmlv3.q3_K_L.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "USER: What does llama do?\nASSISTANT:"
And here's the speed:
llama_print_timings: load time = 53395.29 ms
llama_print_timings: sample time = 3.72 ms / 41 runs ( 0.09 ms per token, 11030.40 tokens per second)
llama_print_timings: prompt eval time = 754.11 ms / 15 tokens ( 50.27 ms per token, 19.89 tokens per second)
llama_print_timings: eval time = 10442.69 ms / 40 runs ( 261.07 ms per token, 3.83 tokens per second)
llama_print_timings: total time = 11208.28 ms
ggml_metal_free: deallocating
Hope it helps!
Hi @Crear12 thanks for your info. I think your info is only for inference and not finetuning :) but thanks for sharing!
I opened a bug, https://github.com/ggerganov/llama.cpp/issues/3911
after -ngl was introduced in finetune. So still unable to use practically. Maybe you can have a look.
@fakerybakery Hey I'm interested in doing the same on Macbook Pro M1 and I was wondering if you could share what your data looks like? And what documentation you used to come up with your script with the different parameters explained. Thanks.
hey guys, while I keep an eye on the development of finetune for Metal, is there any example to finetune foe classification and not pure text ?
This issue was closed because it has been inactive for 14 days since being marked as stale.
Hi, I'm running the latest version of llama.cpp (cloned yesterday from the Git repo) on macOS Sonoma 14.0 on a M1 MacBook Pro. I tried to finetune a Llama model, and the training worked, however it was extremely slow and Activity Monitor did not indicate any GPU usage for the finetuning script, although it was using most of my CPU. Here is my script:
Here is the console output:
Is Metal supported on finetuning yet? If so, is there any configuration needed to get it to work? Thanks in advance!