Open martindevans opened 4 days ago
Note: until this is resolved, there will be no new binary update :(
I have pushed up a test of one potential solution (downloading directly from llama.cpp). You can see it here: https://github.com/SciSharp/LLamaSharp/blob/july-2024-binaries/LLama/LLamaSharp.csproj#L55
From a usability perspective this is pretty nice. Changing versions jsut requires updating that single LlamaCppReleaseTag
property!
However, llama.cpp do not publish any shared objects for Linux!
In my opinion, it's a better option to drop the local binaries in git everywhere except the backend package building. It's similar to your option2 and I'd like to give a detailed description of it.
All the binaries could be removed from LLamaSharp git repo. Instead, we use the nuget backend packages in our example project. Where to put the binaries could be flexible because users are not supposed to touch it. For example, we can put the binaries in another git repo with git-lfs and the reference it as a submodule.
In this case, when binaries update is merged into master branch, we must publish a new release at once to make the new binaries available on nuget.
To keep our CI available, we need to update the submodules before running the test in workflows. Besides, we need to copy the binaries to output folder in LLamaSharp.unittest.csproj
.
However, I have to say that the unit test coverage would be a problem. What I said above is based on an assumption that unit test is only for CI and examples are only for users. But actually, the example project is also responsible for test coverage now, which needs to be improved in the future.
I think this way is more clear for users because they only need to care about the nuget packages. Any ideas? @martindevans
What do you think of the potential solution I pushed up here. With this idea, the process would be:
Release
somewhere on GitHub (either in this repo, or a dedicated LLamaSharp-Binaries repo). This is really just acting as file storage.<LlamaCppReleaseTag>b3289</LlamaCppReleaseTag>
to refer to the new release versionThe example linked above is downloading directly from llama.cpp, but that isn't an option at the moment (they don't publish Linux shared objects). So it would be modified to download from our own release (e.g. https://github.com/martindevans/LLamaSharp/releases/tag/test-binaries), but the idea is the same.
we can put the binaries in another git repo with git-lfs and the reference it as a submodule.
Just a note about git-lfs, unfortunately I don't think we can use it at all. The GitHub free limits on git-lfs are tiny - just 1GiB a month across your whole account! So downloading 5x CUDA binaries would totally exhaust your entire account allocation for the whole month. That's not something I want to risk happening by accident!
What do you think of the potential solution I pushed up here.
It's ok but I think we should use the nuget package directly in our example project. Thus it will be more clear for new users. The only thing we need to do is to publish a new nuget package every time we update the binaries. Since git-lfs is limited on github, then downloading from the release is a good option.
In this way there will be another problem. If you want to run the github workflows, you need to let the unit test project downloads the new binaries. The binaries are put in release. However, we shouldn't publish a release without passing all the workflows. The only way I can come up with is to delete the release if the workflow fails.
The binaries are put in release. However, we shouldn't publish a release without passing all the workflows. The only way I can come up with is to delete the release if the workflow fails.
I think if we published the releases ourselves we'd basically end up with two types of "release".
It's pretty messy :(
Splitting out the releases to another repo (which exists just for binary releases) might be a way to work around that, but it's a bit of a pain to have multiple repos.
I'm going to go and open an issue on the llama.cpp repo asking about shared objects in their releases. That way we would be able to skip the entire build step, our binaries would be the "official" ones, and we wouldn't have to mess around with any binary-only-releases.
That'll probably be slower than anything we do ourselves here, but it seems like the best overall solution.
Ok I changed my mind, I was typing up the feature request and it would be a colossal increase in the number of binaries they would need to compile for every release. I don't think there's any chance they would do it!
I've created a new release in this repo, just to test what it would look like. It's here: https://github.com/SciSharp/LLamaSharp/releases/tag/test-release-please-ignore. If we decide to go ahead with investigating this approach I'll attach some binaries to it.
If we went with a "binary only" release in this repo, we would do this:
1) Run a GitHub action to generate new binaries
2) Make a release with this binaries, it's not marked as the latest release so the front page still points to 0.13.0
3) Make necessary code changes to support new binaries, change LlamaCppReleaseTag
in csproj
to point to the release we just created
4) Open PR, test on all platforms. Anyone opening this version will automatically download the binaries.
5) Merge it. Anyone opening the project on master will auto download the new binaries.
6) Make nuget packages, as normal. Publish a new "proper" release with release notes etc.
Hmm, what about compressing the deps (at the end of the build action in compile.yml
) and extracting when building the project? CUDA12 in the llama.cpp repo is ~95MB. I just did a quick test here and the compressed CUDA12 dep archive (containing llama.dll
+ ggml.dll
) resulted in 91MB.
The limit is 100MB and according to an issue in the llama.cpp repo discussing the size these binaries are going to grow (support for new GPUs, new kernels etc). Given how close we already are to the limit when zipped that'd be a temporary solution. It is the probably the easiest option though.
Yes, indeed. I don't know how much/fast the binaries on the llama.cpp side will grow, but it sounds like a matter of time until we run into the same issue. If the transition to the LLamaSharp-Binaries workflow goes smooth, a temporary solution could probably be skipped entirely.
I think it's cleaner to split the binaries to LLamaSharp-Binaries, to avoid confusion in the LLamaSharp releases section.
So, when doing a binary update:
Created a repo here for development: https://github.com/martindevans/LLamaSharpBinaries/releases/tag/1c5eba6f8e62
I'll put together a prototype downloading binaries from here, and will transfer ownership to SciSharp if we go ahead with this approach.
Description
We have a problem with the next binary update. The code changes have been done, and are sitting in this branch. The binaries have been compiled, and are sitting in this action.
However, the new binaries from llama.cpp are now too large to commit to GitHub (over 100MB). The push simply fails. GitHub suggests git-lfs, but this does not seem like a practical solition because the free bandwidth is too low - we would exhaust it with just 5 files. Therefore, we can no longer commit the binaries.
Note that this does not affect the final distribution since that is done through nuget, it just affects development in this repo.
Ideas: