ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.96k stars 9.62k forks source link

metal : simplify kernel arguments using a struct #3229

Open ggerganov opened 1 year ago

ggerganov commented 1 year ago

Create a struct ggml_metal_locals and populate using GGML_TENSOR_LOCALS similar to what we do in ggml.c:

https://github.com/ggerganov/llama.cpp/blob/3b4bab6a38502d9e68587c2c19f26472480ec4dd/ggml.c#L244-L256

Refactor all kernels to accept a single struct of ggml_metal_locals in order to avoid long lists of arguments such as:

https://github.com/ggerganov/llama.cpp/blob/3b4bab6a38502d9e68587c2c19f26472480ec4dd/ggml-metal.m#L753-L782

https://github.com/ggerganov/llama.cpp/blob/3b4bab6a38502d9e68587c2c19f26472480ec4dd/ggml-metal.metal#L29-L61

ZacharyDK commented 1 year ago

Hey @ggerganov . So I've done some digging. Objective C is a headache, but necessary because Apple makes it a requirement to use metal. Unreal Engine is C++, and they use a C++ wrapper library, avoiding all objective C. This is the library by naleksiev.

https://github.com/naleksiev/mtlpp/blob/master/LICENSE

Refactoring the ggml-metal.m file and relevant files to use this library would have the benefits of cutting out objective C, simplifying the code base, and squashing any bugs related to using objective C. Also would likely fix the numerous kernel loading bugs on Macs with AMD. This change should let Mac users utilize the GPU on whatever, shouldn't make a difference between M1, M2, and AMD

The mtlpp library has been tried and tested with Unreal Engine, so it probably will do the heaving lifting without too much pain.

ggerganov commented 5 months ago

✨✨ Here's an AI-assisted sketch of how you might approach this issue saved by @ggerganov using Copilot Workspace v0.17

Topic
Has the `ggml_metal_locals` struct been created and used to simplify kernel arguments in the metal kernels?
Before
* No, the `ggml_metal_locals` struct has not been created in the code. * The metal kernels still use long lists of arguments as shown in `ggml-metal.m` and `ggml-metal.metal`. * There is no evidence in the provided code snippets of a struct being used to simplify kernel arguments for Metal kernels.
After
* Yes, the `ggml_metal_locals` struct has been created and is now used in the Metal kernels. * The struct is defined in `ggml-metal.h`, simplifying kernel arguments by encapsulating them. * Metal kernels in `ggml-metal.metal` now accept a single `ggml_metal_locals` struct as their argument. * The long lists of arguments in Metal kernels are replaced with the `ggml_metal_locals` struct, enhancing code readability and maintainability.
Plan
* `ggml-metal.h` (CHANGE) - Define a new struct `ggml_metal_locals` to encapsulate kernel arguments. - Include necessary headers and dependencies. * `ggml-metal.m` (CHANGE) - Refactor all kernel function calls to use the `ggml_metal_locals` struct for arguments. - Update the function signatures to accept `ggml_metal_locals` instead of individual arguments. * `ggml-metal.metal` (CHANGE) - Update kernel function signatures to accept a single `ggml_metal_locals` struct as their argument. - Adjust the kernel implementations to access arguments through the `ggml_metal_locals` struct.
Sketch of implementation
[View the changes](https://copilot-workspace.githubnext.com/ggerganov/llama.cpp/issues/3229?shareId=0fb55432-b8e3-4090-a3e8-debb98309d96)
Details
Code analyzed at b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2
ggerganov commented 5 months ago

Playing with the tech preview of "Copilot Workspaces": https://copilot-workspace.githubnext.com/ggerganov/llama.cpp/issues/3229?shareId=9c38fc11-f7d8-45b7-b1bc-81678a27a9e0

It does not like big files 😢

image