LLamaSharp.Backend.Cuda12 0.17.0 NuGet package is missing - Githubissues

SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

https://scisharp.github.io/LLamaSharp

MIT License

2.7k stars 349 forks source link

LLamaSharp.Backend.Cuda12 0.17.0 NuGet package is missing #949

Closed newsletternewsletter closed 1 month ago

newsletternewsletter commented 1 month ago

Description

On nuget.org, LLamaSharp and LLamaSharp.Backend.Cpu version 0.17.0 NuGet packages are available, but the latest CUDA package LLamaSharp.Backend.Cuda12 version 0.17.0 seems to be missing: LLamaSharp.Backend.Cuda12 Will it be available at a later date?

martindevans commented 1 month ago

Unfortunately the CUDA12 package has exceeded the size limit of nuget packages. llama.cpp produces enormous binaries for CUDA due to the sheer number of kernels getting compiled. We didn't know about this size limit before hitting it halfway through the 0.17.0 deployment, so it's been left in a bit of a broken state for now 😱

We're going to have to work out an entirely new way to deploy the binaries (probably split them into multiple packages), but until that's done it'll remain broken, sorry :(

aropb commented 1 month ago

but you can download it from the repository llama.cpp ? which ones are the right ones to take?

martindevans commented 1 month ago

It's documented in the table at the bottom of the readme: https://github.com/SciSharp/LLamaSharp?tab=readme-ov-file#map-of-llamasharp-and-llamacpp-versions.

For 0.17.0 it's this commit which corresponds to this version.

aropb commented 1 month ago

do I need to download files that start with llama-b3804...? There are many different ones there :) maybe then you just need to give a link to llama.cpp and remove your runtime?

martindevans commented 1 month ago

If you want replacements for the CUDA binaries that are missing you'll want to download either cudart-llama-bin-win-cu11.7.1-x64.zip or cudart-llama-bin-win-cu12.2.0-x64.zip.

aropb commented 1 month ago

Thanks! Oh, 400mb of cuda12 now? and if for the cpu?

martindevans commented 1 month ago

The CPU backend is available through nuget (https://www.nuget.org/packages/LLamaSharp.Backend.Cpu), but if you want to download it manually you would grab one of the bin-win-avx versions, where you choose the AVX level appropriate for your hardware (normally this is done automatically by the nuget package).

aropb commented 1 month ago

When you build runtime for Linux, is it universal? I see them only for Ubuntu. On which Linux will it work?

martindevans commented 1 month ago

I'm not really very familiar with the details of it on Linux tbh. The normal LLamaSharp runtimes are built with Ubuntu, so if you use the llama.cpp binaries built on Linux they'll be just as compatible as any of our other Linux releases are.

aropb commented 1 month ago

cudart-llama

That's right, though: llama-bXXXX-bin-win-cuda-cu12.2.0-x64.zip

aropb commented 1 month ago

Problem: llama does not upload the build for linux. Maybe you also like them posting runtime archive on github?

win-x64/cpu win-x64/cuda11 win-x64/cuda12 ... linux-x64/cpu linux-x64/cuda11 linux-x64/cuda12 ...

martindevans commented 1 month ago

Ah, if llama.cpp don't offer them you could also download the prebuilt ones from here: https://github.com/SciSharp/LLamaSharpBinaries/releases/tag/c35e586ea5722184

These are the ones that were meant to be published.

aropb commented 1 month ago

Thanks!

LSXPrime commented 1 month ago

I know the CUDA backend depends on large libraries like libcuda and libcudaart, but why has the Vulkan library not been updated yet? it's smaller than the CPU backend with no dependencies

martindevans commented 1 month ago

Vulkan uploads after CUDA, so it never got deployed due to the failure with CUDA. It will hopefully be fixed with the next update, which will be done as soon as there's a fix for the CUDA packages.

The best plan we have for now is to split the CUDA packages into 3 packages:

CUDA
- CUDA.Windows
- CUDA.Linux

That way we can make each individual package smaller, but the end user experience is the same (just depend on the root CUDA package).

I probably won't personally have time to look into that for a couple of weeks. If anyone else wants to make a start on it I'll be happy to provide tips and reviews (just ping me here or on Discord).

m0nsky commented 1 month ago

This should now be fixed in the latest (v0.18.0) release!