New update asking me to redo embeddings.

brianpetro / obsidian-smart-connections

Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3

https://smartconnections.app

GNU General Public License v3.0

2.66k stars 179 forks source link

New update asking me to redo embeddings. #440

Open bbecausereasonss opened 8 months ago

bbecausereasonss commented 8 months ago

I have a TON of files. I just updated smart-connections and it's asking me to redo embeddings, except now for notes/blocks... I'm confused. How do I use my existing embeddings. I believe you were defaulting to OpenAI Ada before...

I really don't want to spend $80 on making new ones.

brianpetro commented 8 months ago

@bbecausereasonss v2 isn't backward compatible, but the software fails, even v1, with just a few dollars of embeddings. If using the OpenAI API, try the text-small model, it's cheaper than Ada. But you can also use the local models, which are free for embedding.

🌴

bbecausereasonss commented 8 months ago

@bbecausereasonss v2 isn't backward compatible, but the software fails, even v1, with just a few dollars of embeddings. If using the OpenAI API, try the text-small model, it's cheaper than Ada. But you can also use the local models, which are free for embedding.

🌴

Gotcha. Any recommendations on choosing an embedding model?

https://www.pinecone.io/learn/openai-embeddings-v3/

Seems to say that the 256 dimension large model still performs better than ADA02 at 1536 dimensions.

brianpetro commented 8 months ago

It's tough to say, a lot depends on individual use cases.

Smaller models for blocks would be a good idea if you're using local models because it will be faster.

The OpenAI models, large & small, are both better than the Ada model. The OpenAI large & small models with reduced dimension sizes were chosen because they performed equivalent to Ada (with one being cheaper and the other maximizing performance per dimension/storage-space).

Overall, it's definitely something to play around with. And more models will continuously be added, so there is a lot to explore.

I hope that helps!

🌴

bbecausereasonss commented 8 months ago

Thanks it does. Going with Small models for now. Do the local models require some type of server, or are they part of the plugin?

brianpetro commented 8 months ago

@bbecausereasonss no setup is required for the local models, I'm not sure if that's what you're asking.

Local models utilize huggingface transformers.js, which downloads the model from huggingface servers, as it's not feasible to include the >100MB, sometimes >1GB, models in the compiled plugin code. But, your data does stay locally where the model is run, unlike the OpenAI models where your data is sent to their servers for embedding.

🌴