JBGruber / rollama

https://jbgruber.github.io/rollama/
GNU General Public License v3.0
91 stars 2 forks source link

Ollama now supports embedding models #5

Open kasperwelbers opened 7 months ago

kasperwelbers commented 7 months ago

Ollama added support for embedding models like BERT. This is much faster than using a generative model, such as llama2, which is currently the default in embed_text.

Changing this default, and perhaps adding documentation to help people pick good embedding models, could make rollama super useful for all sorts of downsteam tasks in R!

JBGruber commented 7 months ago

Nice! It didn't work until I updated to v0.1.29 (0.1.26 is apparently the minimum). But then nomic-embed-text was about 4 times faster than the default llama2 model in the embedding vignette example (and the f-means of the resulting model was 0.05 better :wink: ).

I think about the best approach for this. Having one default throughout the package is neat, but models meant for embedding are definitly faster and make more sense for a lot of people. I will at least add it to the vignette and the examples.

JBGruber commented 7 months ago

It would also be good to add how you can use arbitrary embedding models from huggingface. Not sure if the process is the same for these models as what is documented here: https://github.com/ollama/ollama/blob/main/docs/import.md

JBGruber commented 7 months ago

[Post removed]

This only worked because I grabbed the wrong modelfile. It's actually more complicated...

kasperwelbers commented 7 months ago

Nice, thats really cool!

What is the purpose of Python here? Is this only that it downloads the model? Because then it might also be done with hfhub, which seems to be an effort of team posit to get huggingface to R.

JBGruber commented 7 months ago

That's exactly what I was looking for! For some reason it didn't show up in my searches and I assumed that I've dreamed it :sweat_smile:. Yes, the Python stuff was just for downloading the file. Now all we need is a good heuristic to identify the file Ollama wants.

JBGruber commented 7 months ago

Ok, I was a bit quick with the post above and couldn't reproduce it with the files downloaded through hfhub. Finally, I noticed I had acidetally grabbed the wrong model file.

You need indeed to first follow the steps to convert the model using convert-hf-to-gguf.py. And then move the converted bin file to a directory Ollama has access to (in my case inside the container).

So for now, I would tell people to rely on either nomic-embed-text or all-minilm and check what might be added in the future.