MinishLab / model2vec

The Fastest State-of-the-Art Static Embeddings in the World
https://minishlab.github.io/
MIT License
485 stars 20 forks source link

Add `model2vec` to config.json #134

Open davidmezzetti opened 1 day ago

davidmezzetti commented 1 day ago

Hello.

I'm planning to add a change to txtai to autodetect model2vec models. The best idea I have right now is the read the config.json file and see if it has the keys apply_pca and apply_zipf.

While I believe this will be pretty unique, have you guys considered adding something to the config.json file to signal it's a model2vec file?

Pringled commented 1 day ago

Hey @davidmezzetti,

That's a great suggestion! We will add the following to our config.json files:

"model_type": "model2vec",
"architectures": [
    "StaticModel"
  ],

And then you can use the model_type key to check for model2vec models. I'll ping you once we've made that change.

davidmezzetti commented 1 day ago

Sounds great! This will make it easier in my case as txtai is working with multiple vectorization libraries. Once this change is in, txtai will be able to automatically infer the vectorization method for model2vec models.

import txtai
embeddings = txtai.Embeddings(path="minishlab/potion-base-8M")