getcursor / cursor

The AI Code Editor
https://cursor.com
22.69k stars 1.46k forks source link

Allow use of SBERT (or any other open source model) for local embedding computation #727

Open jalvarado91 opened 1 year ago

jalvarado91 commented 1 year ago

Is your feature request related to a problem? Please describe.

Cursor seems like a really interesting idea and the integration points look really good. The ability to provide my own OpenAI key does a good job at giving me some confidence about how my codebase is used. However, one of the big limiting factors for myself is the fact that my codebase is still being sent somewhere remote for purposes of creating embeddings (looks like it's using OpenAI's embeddings).

Having done the vector dance myself, I understand the purpose of the embeddings, but my codebase going somewhere remote is a deal breaker, and even if that wasn't an issue, as a 'bring your own key' user, the pricing to index a codebase can get expensive quickly.

Describe the solution you'd like The solution is the same I've used in other RAG implementations. Rather than using OpenAI embeddings for semantic search, it would be awesome to be able to provide either an HuggingFace api key or some sort of 'callback' to be able to use embedding models that can run on my own computer. There's plenty of SBERT models that deliver embedding that perform well enough to replace the OpenAI embeddings at basically zero cost once the model is downloaded and don't require a beefy GPU to run.

Seems like the core editor experience isn't fully open source yet, so hard to make a PR to demonstrate, but open to chatting about it

jjfantini commented 11 months ago

+1