Closed nyo16 closed 4 months ago
Hey @nyo16, the source code for Nomic is currently on HF Hub (here), which means it diverges from Bert in some details. On a quick glance it uses rotary embedding scaling (to support longer token input), which is not available in Bert, there may be other differences. Code on the Hub is used for new models and it can change if the model is still being improved. Popular models are eventually added to hf/transformers, once they stabilise. Once it is added to hf/transformers we can add it as well :)
The Nomic model for embeds looks promising and supporting 8k seq. HG link