elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

add support for Nomic embed model v1 & 1.5 #350

Closed nyo16 closed 4 months ago

nyo16 commented 4 months ago

The Nomic model for embeds looks promising and supporting 8k seq. HG link

jonatanklosko commented 4 months ago

Hey @nyo16, the source code for Nomic is currently on HF Hub (here), which means it diverges from Bert in some details. On a quick glance it uses rotary embedding scaling (to support longer token input), which is not available in Bert, there may be other differences. Code on the Hub is used for new models and it can change if the model is still being improved. Popular models are eventually added to hf/transformers, once they stabilise. Once it is added to hf/transformers we can add it as well :)