Closed newsletternewsletter closed 1 month ago
Unfortunately the CUDA12 package has exceeded the size limit of nuget packages. llama.cpp produces enormous binaries for CUDA due to the sheer number of kernels getting compiled. We didn't know about this size limit before hitting it halfway through the 0.17.0 deployment, so it's been left in a bit of a broken state for now 😱
We're going to have to work out an entirely new way to deploy the binaries (probably split them into multiple packages), but until that's done it'll remain broken, sorry :(
but you can download it from the repository llama.cpp ? which ones are the right ones to take?
It's documented in the table at the bottom of the readme: https://github.com/SciSharp/LLamaSharp?tab=readme-ov-file#map-of-llamasharp-and-llamacpp-versions.
For 0.17.0 it's this commit which corresponds to this version.
do I need to download files that start with llama-b3804...? There are many different ones there :) maybe then you just need to give a link to llama.cpp and remove your runtime?
If you want replacements for the CUDA binaries that are missing you'll want to download either cudart-llama-bin-win-cu11.7.1-x64.zip
or cudart-llama-bin-win-cu12.2.0-x64.zip
.
Thanks! Oh, 400mb of cuda12 now? and if for the cpu?
The CPU backend is available through nuget (https://www.nuget.org/packages/LLamaSharp.Backend.Cpu), but if you want to download it manually you would grab one of the bin-win-avx versions, where you choose the AVX level appropriate for your hardware (normally this is done automatically by the nuget package).
When you build runtime for Linux, is it universal? I see them only for Ubuntu. On which Linux will it work?
I'm not really very familiar with the details of it on Linux tbh. The normal LLamaSharp runtimes are built with Ubuntu, so if you use the llama.cpp binaries built on Linux they'll be just as compatible as any of our other Linux releases are.
cudart-llama
That's right, though: llama-bXXXX-bin-win-cuda-cu12.2.0-x64.zip
Problem: llama does not upload the build for linux. Maybe you also like them posting runtime archive on github?
win-x64/cpu win-x64/cuda11 win-x64/cuda12 ... linux-x64/cpu linux-x64/cuda11 linux-x64/cuda12 ...
Ah, if llama.cpp don't offer them you could also download the prebuilt ones from here: https://github.com/SciSharp/LLamaSharpBinaries/releases/tag/c35e586ea5722184
These are the ones that were meant to be published.
Thanks!
I know the CUDA backend depends on large libraries like libcuda and libcudaart, but why has the Vulkan library not been updated yet? it's smaller than the CPU backend with no dependencies
Vulkan uploads after CUDA, so it never got deployed due to the failure with CUDA. It will hopefully be fixed with the next update, which will be done as soon as there's a fix for the CUDA packages.
The best plan we have for now is to split the CUDA packages into 3 packages:
That way we can make each individual package smaller, but the end user experience is the same (just depend on the root CUDA package).
I probably won't personally have time to look into that for a couple of weeks. If anyone else wants to make a start on it I'll be happy to provide tips and reviews (just ping me here or on Discord).
This should now be fixed in the latest (v0.18.0) release!
Description
On nuget.org, LLamaSharp and LLamaSharp.Backend.Cpu version 0.17.0 NuGet packages are available, but the latest CUDA package LLamaSharp.Backend.Cuda12 version 0.17.0 seems to be missing: LLamaSharp.Backend.Cuda12 Will it be available at a later date?