Open AlexCheema opened 3 weeks ago
Thanks @AlexCheema
Myself and many others likely only have windows systems, and llama.cpp is practically the only option.
MLX is macOS and tinygrad:
Windows support has been dropped to focus on Linux and Mac OS. Some functionality may work on Windows but no support will be provided, use WSL instead. source: https://github.com/tinygrad/tinygrad/releases/tag/v0.7.0
For this opening statement to be true, it would need to include windows-based systems, especially old gaming rigs.
Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, Linux, pretty much any device!
At the very least, a thorough guide on setting up tinygrad via WSL/WSL2 would be appreciated, because this is your only documentation:
Example Usage on Multiple MacOS Devices
I'd like to look into this. Adjacently, llamafiles might be worth looking into as they are binaries able to run on multiple desktop OSes without any configuration. Though I'm not sure about Android or IOS support.
I'd like to look into this. Adjacently, llamafiles might be worth looking into as they are binaries able to run on multiple desktop OSes without any configuration. Though I'm not sure about Android or IOS support.
Go for it!
@bayedieng I'd recommend looking at https://github.com/abetlen/llama-cpp-python -- it should hopefully be low level enough to do what we need to do. Also, I'd recommend looking at https://github.com/exo-explore/exo/pull/139 for a minimal implementation of an inference engine that doesn't require explicitly defining every model -- it's a general solution.
Thanks for the suggestion. Yeah I had already seen the python bindings and went ahead and began a draft PR.
I wonder if WebGPU can be plugged on top of Llama.cpp via this https://github.com/AnswerDotAI/gpu.cpp wrapper ?