bachittle commented 11 months ago

I really enjoyed the examples for running whisper.cpp on iOS using both objective-c and swift-ui (found at https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.objc and https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.swiftui respectively) and was wondering if the process can be recreated for this repository. I believe that having a minimal example repository would be useful.

I'd be willing to make an attempt, but I need to familiarize myself with the process that was performed in the whisper.cpp repository.

jhen0409 commented 11 months ago

We recently fixed the iOS build, so I believe add an example will be welcome.

bachittle commented 11 months ago

So after this PR it should be possible to build for iOS? https://github.com/ggerganov/llama.cpp/pull/3116

From what it looks like, I would need to generate the xcode project using cmake, then be able to wrap a project around that. Shouldn't be too bad!

ggerganov commented 11 months ago

Yes, an iOS example of say StarCoder 1B Q8_0 would be great. Make sure to enable Metal

aehlke commented 11 months ago

I've been unable to find any working examples elsewhere on github for wrapping this library in Swift and not crashing or showing broken output

bachittle commented 11 months ago

The main issue seems to be that the API for llama.cpp is more complex than whisper.cpp. The common files that provide convenience functions can't be wrapped trivially into swift since it uses C++ features.

I tried messing around with the cmake, but I'm not a huge fan. I think an isolated example of an xcode project would be nicer. I've been doing some experiments, we'll see how it turns out.

aehlke commented 11 months ago

https://github.com/guinmoon/LLMFarm has up to date llama.cpp running on ios/macos but not with the very latest ios build stuff, and https://github.com/CameLLM/CameLLM is out of date but maybe useful to reference

jhen0409 commented 11 months ago

If you're building a Swift project, an easier way is to use the Swift package. The cmake build is more like a CI purpose to me, I haven't used it on a project yet.

Also, In the llama.rn binding I created I ported the server_context struct from the server example, which you can also refer to. (It's a Objective-C project)

bachittle commented 11 months ago

I created a custom repository to use llama.cpp as a swift package. It successfully performs inference, however I was not sure how to enable metal. Here is the repo: https://github.com/bachittle/llama.swiftui

I also made a fork of llama.cpp and added the files manually, similar to how its done in whisper.swiftui example. This also successfully performed inference. Here is the fork for reference: https://github.com/bachittle/llama.cpp/tree/swiftui_example.

Finally, I tried enabling metal. It failed with the following error: Screenshot 2023-10-01 at 11 15 29

I'm guessing I have the bundle in the wrong location, I will continue experimenting. But this is my progress so far. Here is the metal branch: https://github.com/bachittle/llama.cpp/tree/swiftui_metal

jhen0409 commented 11 months ago

I created a custom repository to use llama.cpp as a swift package. It successfully performs inference, however I was not sure how to enable metal. Here is the repo: https://github.com/bachittle/llama.swiftui

I also made a fork of llama.cpp and added the files manually, similar to how its done in whisper.swiftui example. This also successfully performed inference. Here is the fork for reference: https://github.com/bachittle/llama.cpp/tree/swiftui_example.

Finally, I tried enabling metal. It failed with the following error: [Screenshot 2023-10-01 at 11 15 29]

I'm guessing I have the bundle in the wrong location, I will continue experimenting. But this is my progress so far. Here is the metal branch: https://github.com/bachittle/llama.cpp/tree/swiftui_metal

I've tried the llama.swiftui repo, it works on my Mac & iPad.

In your llama.cpp fork, it looks like the llama.swiftui not using the swift package, so it's not able to get metal lib from the bundle. You can choice not to use GGML_SWIFT def in project and load the metal file dynamically, or try to using the local swift package (this may have other problems to solve).

bachittle commented 10 months ago

I created a custom repository to use llama.cpp as a swift package. It successfully performs inference, however I was not sure how to enable metal. Here is the repo: https://github.com/bachittle/llama.swiftui I also made a fork of llama.cpp and added the files manually, similar to how its done in whisper.swiftui example. This also successfully performed inference. Here is the fork for reference: https://github.com/bachittle/llama.cpp/tree/swiftui_example. Finally, I tried enabling metal. It failed with the following error: [Screenshot 2023-10-01 at 11 15 29] I'm guessing I have the bundle in the wrong location, I will continue experimenting. But this is my progress so far. Here is the metal branch: https://github.com/bachittle/llama.cpp/tree/swiftui_metal

I've tried the llama.swiftui repo, it works on my Mac & iPad.

In your llama.cpp fork, it looks like the llama.swiftui not using the swift package, so it's not able to get metal lib from the bundle. You can choice not to use GGML_SWIFT def in project and load the metal file dynamically, or try to using the local swift package (this may have other problems to solve).

Do you know how to enable metal from the swift package? I'm just more familiar with doing it manually as how it was done in the swiftui_metal branch. I'll see if it works without the GGML_SWIFT branch, the naming scheme was just confusing (since I am writing a swift wrapper, that's what I thought I needed to enable lol).

I can also try the local package, that would allow for less code repetition. I just figured I'd do it the same way it was done for whisper.swiftui for the example, more flexibility that way.

jhen0409 commented 10 months ago

Do you know how to enable metal from the swift package?

It should be enabled by default in the swift package if it's on arm64. If we defined GGML_USE_METAL:

https://github.com/ggerganov/llama.cpp/blob/1c84003c08027f5d3a4cb876f51d6b6224a34d0e/llama.cpp#L6572-L6574

And you can see the kernels are loaded in the logs after load model.

bachittle commented 10 months ago

I was able to run inference of starcoder 1B using metal! It's running a lot faster now! added a video to the README here: https://github.com/bachittle/llama.cpp/tree/swiftui_metal/examples/llama.swiftui

will add more documentation after I update to latest llama.cpp (it has conflicts with newest codebase ATM)

bachittle commented 10 months ago

Tried updating to latest llama.cpp (the batch update), ran into issues. Detailed documents can be found here: https://github.com/bachittle/llama.cpp/pull/1. In short, some models don't load at all due to an "invalid character" error, and the ones that do load crash on calling the llama_decode function. For now I am stuck until I figure out the cause of this.

So for now, swiftui_metal branch is the most functional, but is out of date with latest llama.cpp.

ggerganov commented 10 months ago

I think whatever models you have downloaded, you need to re-convert them.

3252 basically renders all models using the BPE tokenizer (such as Starcode, Falcon, Refact, etc.) obsolete

bachittle commented 10 months ago

Yes that fixes the "invalid character" error. Now just need to find this reason for the crash, and then we will be rolling!

jhen0409 commented 10 months ago

@bachittle I just give a try with swiftui_metal_update branch in https://github.com/bachittle/llama.cpp/tree/swiftui_metal/examples/llama.swiftui, I confirmed use -O3 to compile source can fix the llama_decode crash (see #3527 and comments), but it doesn't solve the underlying problem, just made it work.

Then, I got some errors after llama_decode(...) in LibLlama.swift, I think these issues should easy to fix.

bachittle commented 9 months ago

Sorry haven't updated in awhile, went on vacation then got distracted with work. Anyways, tried the O3 and now getting insufficient memory access. This might mean that my device does not have enough memory? Not sure what to fix as of now.

Screenshot 2023-11-17 at 19 47 09

bachittle commented 9 months ago

Fixed everything! Latest commit here is working with up-to-date llama.cpp and gguf v2: https://github.com/bachittle/llama.cpp/tree/swiftui_metal_update

Also made a pull request: https://github.com/ggerganov/llama.cpp/pull/4159

Starcoder-1b runs the best of course. 3B models like stablelm can also run if you have 6gb ram, otherwise it will run if quantized. 7b models like mistral need to be heavily quantized unless you're running on an iPhone 15 pro max with 8gb ram.

1-ashraful-islam commented 8 months ago

I am trying to follow the example by @bachittle and add whisper.cpp to the ios project and I am getting a lot of redefinition errors for "ggml.h"

I also tried adding whisper.spm package as package dependency instead and get similar errors.

Sorry for re-opening the issue if it's something straightforward. I am pretty new to iOS development and haven't been able to find any solution for this.

1-ashraful-islam commented 8 months ago

I was able to fix the issue by forking the packages and updating the Package.swift definition so that they both use ggml as package dependency. I will do a pull request to add those changes.

1-ashraful-islam commented 8 months ago

I added the following pull requests to solve the issue I mentioned earlier.

https://github.com/ggerganov/ggml/pull/674 https://github.com/ggerganov/llama.cpp/pull/4691 https://github.com/ggerganov/whisper.cpp/pull/1701

When adding whisper.cpp and llama.cpp as package dependency in a swiftui project we get ggml imported in both packages. This failed to compile during linker phase.

These pull requests solves the issue. @bachittle @ggerganov

enzokro commented 5 months ago

Hi all, Thanks so much for the great work and resources in this thread!

A quick/silly question, how can we pass command-line arguments to the swift app? For example, I am trying to run a 2B Gemma IT model, passing the suggested args seen here: Repeat-penalty args

I am new to Swift and XCode, can't seem to find the right place to hook these in.

Thanks!

ggerganov / llama.cpp

examples for iOS (objc / swift ui) #3284

3252 basically renders all models using the BPE tokenizer (such as Starcode, Falcon, Refact, etc.) obsolete