SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
MIT License
2.27k stars 300 forks source link

New Development Binaries System #827

Open martindevans opened 4 days ago

martindevans commented 4 days ago


We have a problem with the next binary update. The code changes have been done, and are sitting in this branch. The binaries have been compiled, and are sitting in this action.

However, the new binaries from llama.cpp are now too large to commit to GitHub (over 100MB). The push simply fails. GitHub suggests git-lfs, but this does not seem like a practical solition because the free bandwidth is too low - we would exhaust it with just 5 files. Therefore, we can no longer commit the binaries.

Note that this does not affect the final distribution since that is done through nuget, it just affects development in this repo.


martindevans commented 4 days ago

Note: until this is resolved, there will be no new binary update :(

martindevans commented 4 days ago

I have pushed up a test of one potential solution (downloading directly from llama.cpp). You can see it here:

From a usability perspective this is pretty nice. Changing versions jsut requires updating that single LlamaCppReleaseTag property!

However, llama.cpp do not publish any shared objects for Linux!

AsakusaRinne commented 2 days ago

In my opinion, it's a better option to drop the local binaries in git everywhere except the backend package building. It's similar to your option2 and I'd like to give a detailed description of it.

All the binaries could be removed from LLamaSharp git repo. Instead, we use the nuget backend packages in our example project. Where to put the binaries could be flexible because users are not supposed to touch it. For example, we can put the binaries in another git repo with git-lfs and the reference it as a submodule.

In this case, when binaries update is merged into master branch, we must publish a new release at once to make the new binaries available on nuget.

To keep our CI available, we need to update the submodules before running the test in workflows. Besides, we need to copy the binaries to output folder in LLamaSharp.unittest.csproj.

However, I have to say that the unit test coverage would be a problem. What I said above is based on an assumption that unit test is only for CI and examples are only for users. But actually, the example project is also responsible for test coverage now, which needs to be improved in the future.

I think this way is more clear for users because they only need to care about the nuget packages. Any ideas? @martindevans

martindevans commented 2 days ago

What do you think of the potential solution I pushed up here. With this idea, the process would be:

  1. Run a build with the GitHub action (same as now)
  2. Create a Release somewhere on GitHub (either in this repo, or a dedicated LLamaSharp-Binaries repo). This is really just acting as file storage.
  3. Change the <LlamaCppReleaseTag>b3289</LlamaCppReleaseTag> to refer to the new release version
  4. msbuild will magically download all of the files into the folders where they are now

The example linked above is downloading directly from llama.cpp, but that isn't an option at the moment (they don't publish Linux shared objects). So it would be modified to download from our own release (e.g., but the idea is the same.

we can put the binaries in another git repo with git-lfs and the reference it as a submodule.

Just a note about git-lfs, unfortunately I don't think we can use it at all. The GitHub free limits on git-lfs are tiny - just 1GiB a month across your whole account! So downloading 5x CUDA binaries would totally exhaust your entire account allocation for the whole month. That's not something I want to risk happening by accident!

AsakusaRinne commented 2 days ago

What do you think of the potential solution I pushed up here.

It's ok but I think we should use the nuget package directly in our example project. Thus it will be more clear for new users. The only thing we need to do is to publish a new nuget package every time we update the binaries. Since git-lfs is limited on github, then downloading from the release is a good option.

In this way there will be another problem. If you want to run the github workflows, you need to let the unit test project downloads the new binaries. The binaries are put in release. However, we shouldn't publish a release without passing all the workflows. The only way I can come up with is to delete the release if the workflow fails.

martindevans commented 2 days ago

The binaries are put in release. However, we shouldn't publish a release without passing all the workflows. The only way I can come up with is to delete the release if the workflow fails.

I think if we published the releases ourselves we'd basically end up with two types of "release".

  1. There would be the releases we currently have - actual releases with a change in version number, extensive release notes etc etc.
  2. Then there would also be these new "binary only" releases which are not really releases, they're just a way to store files for dev.

It's pretty messy :(

Splitting out the releases to another repo (which exists just for binary releases) might be a way to work around that, but it's a bit of a pain to have multiple repos.

I'm going to go and open an issue on the llama.cpp repo asking about shared objects in their releases. That way we would be able to skip the entire build step, our binaries would be the "official" ones, and we wouldn't have to mess around with any binary-only-releases.

That'll probably be slower than anything we do ourselves here, but it seems like the best overall solution.

martindevans commented 2 days ago

Ok I changed my mind, I was typing up the feature request and it would be a colossal increase in the number of binaries they would need to compile for every release. I don't think there's any chance they would do it!

martindevans commented 2 days ago

I've created a new release in this repo, just to test what it would look like. It's here: If we decide to go ahead with investigating this approach I'll attach some binaries to it.

If we went with a "binary only" release in this repo, we would do this: 1) Run a GitHub action to generate new binaries 2) Make a release with this binaries, it's not marked as the latest release so the front page still points to 0.13.0 3) Make necessary code changes to support new binaries, change LlamaCppReleaseTag in csproj to point to the release we just created 4) Open PR, test on all platforms. Anyone opening this version will automatically download the binaries. 5) Merge it. Anyone opening the project on master will auto download the new binaries. 6) Make nuget packages, as normal. Publish a new "proper" release with release notes etc.

m0nsky commented 2 days ago

Hmm, what about compressing the deps (at the end of the build action in compile.yml) and extracting when building the project? CUDA12 in the llama.cpp repo is ~95MB. I just did a quick test here and the compressed CUDA12 dep archive (containing llama.dll + ggml.dll) resulted in 91MB.

martindevans commented 2 days ago

The limit is 100MB and according to an issue in the llama.cpp repo discussing the size these binaries are going to grow (support for new GPUs, new kernels etc). Given how close we already are to the limit when zipped that'd be a temporary solution. It is the probably the easiest option though.

m0nsky commented 1 day ago

Yes, indeed. I don't know how much/fast the binaries on the llama.cpp side will grow, but it sounds like a matter of time until we run into the same issue. If the transition to the LLamaSharp-Binaries workflow goes smooth, a temporary solution could probably be skipped entirely.

I think it's cleaner to split the binaries to LLamaSharp-Binaries, to avoid confusion in the LLamaSharp releases section.

So, when doing a binary update:

martindevans commented 6 minutes ago

Created a repo here for development:

I'll put together a prototype downloading binaries from here, and will transfer ownership to SciSharp if we go ahead with this approach.