containers / ramalama

The goal of RamaLama is to make working with AI boring.
MIT License
235 stars 37 forks source link

pulling Granite 3 models #367

Open tarilabs opened 2 hours ago

tarilabs commented 2 hours ago

Currently, granite is shown shortcode corresponding to a gguf serialization:

https://github.com/containers/ramalama/blob/cd1e7d53b570beb00cb767b97fe14749b3932ac0/README.md?plain=1#L56-L57

What would be the equivalent for Granite 3 series recently released, please?

https://huggingface.co/collections/ibm-granite/granite-30-models-66fdb59bbb54785c3512114f

Can ramalama pull also the HF ModelCard, so to make use of it during push?

ericcurtin commented 2 hours ago

I can't seem to find gguf's for those on huggingface, but since they are on ollama, we can just pull via shortnames:

granite3-dense and granite3-moe

already.

ericcurtin commented 2 hours ago

Can ramalama pull also the HF ModelCard, so to make use of it during push?

I don't see why not 😄 would merge functionality like this

tarilabs commented 2 hours ago

Thanks for the feedback in https://github.com/containers/ramalama/issues/367#issuecomment-2435172868 and https://github.com/containers/ramalama/issues/367#issuecomment-2435173992 !

I can't seem to find gguf's for those on huggingface

So for my understanding: is gguf the supported format, or are there other supported formats? 🤔 (sorry if maybe a banal question 😅 )

ericcurtin commented 1 hour ago

Thanks for the feedback in #367 (comment) and #367 (comment) !

I can't seem to find gguf's for those on huggingface

So for my understanding: is gguf the supported format, or are there other supported formats? 🤔 (sorry if maybe a banal question 😅 )

Right now only .gguf works well. We are open to supporting other formats and other runtimes (like llama.cpp and vllm are two ones planned).

As with most feature, it's often comes down to if someone with the time can open a PR!