Ollama (or other OpenAI API compatible local LLMs) support

micheleoletti commented 7 months ago

Knowing that Ollama server supports OpenAI API(https://ollama.com/blog/openai-compatibility), the goal is to point Cursor to query the local Ollama server.

My setup is pretty simple:

Macbook Air M1
Ollama running locally + llama2

I added a llama2 model, set "ollama" as API key(not used but needed apparently), and overridden the base URL to point to localhost.

But it does not work:

If I try to verify the API key it seems like it cannot reach localhost:

But if I try the provided test snippet in the terminal, it works correctly:

So it seems like Cursor internal service is not able to perform the fetch to localhost.

Is there something conceptually wrong with my plan and its implementation? Did anybody manage to make this configuration work?

tosh commented 7 months ago

I also ran into this just now, is there a way to make it work?

andreaspoldi commented 7 months ago

+1

TazzerMAN commented 7 months ago

Same here :/

kcolemangt commented 6 months ago

Ollama Phi3 in the ⌘ K 🚀

Takes some effort but it's fast and works well!

cursor-proxy

micheleoletti commented 6 months ago

@kcolemangt looks amazing! what did you put in the settings to do that? I can't get it to work 🤔

kcolemangt commented 6 months ago

Had to write a custom router and point Cursor’s OpenAI Base URL to that. Thinking of releasing it—want to try?

micheleoletti commented 6 months ago

I see... yeah I'd be interested in trying that out! On what hardware are you running ollama by the way?

kcolemangt commented 6 months ago

released llm-router. lmk how it works for you @micheleoletti

yeralin commented 5 months ago

Cursor does not like when you specify a port in the "Override OpenAI Base URL" field.

If you serve ollama on the default http (80) port, it starts working: OLLAMA_ORIGINS=* OLLAMA_HOST=127.0.0.1:80 (sudo -E) ollama serve

Then you can put http://localhost/v1 under "Override OpenAI Base URL" in Cursor.

UPD: For some reason, Cursor is trying to hit /v1/models endpoint which is not implemented in Ollama:

[GIN] 2024/05/30 - 11:45:40 | 404 |     200.375µs |       127.0.0.1 | GET      "/v1/models"

That causes this error message:

UPD 2: Even after creating a compatible /v1/models endpoint in Ollama, Cursor still refuses to work:

~ curl -k http://127.0.0.1/v1/models
{"data":[{"created":1715857935,"id":"llama3:latest","object":"model","owned_by":"organization-owner"}],"object":"list"}

[GIN] 2024/05/30 - 15:55:17 | 200 |    1.221584ms |       127.0.0.1 | GET      "/v1/models"

In Cursor's Dev Tools I get:

ConnectError: [not_found] Model llama3:latest not found

Tried both llama3 and llama3:latest.

Seems like Cursor has something hardcoded for localhost/127.0.0.1 address.

flight505 commented 4 months ago

there is the repo https://github.com/ryoppippi/proxy-worker-for-cursor-with-ollama which is an option (but not for offline use). The readme also points out there is direct communication to cursor server, which is likely why cursor haven't enabled an easy to use option for true local LLM.

axsaucedo commented 2 months ago

+1

I'm getting an error when using a custom URL even if the command works on the terminal

prakashchokalingam commented 2 months ago

+1

hgoona commented 2 months ago

released llm-router. lmk how it works for you @micheleoletti

Does this work on Windows? I'm wondering how to install?

henry2man commented 2 months ago

I've successfully configured Curxy for this mission https://github.com/ryoppippi/curxy

hgoona commented 2 months ago

I've successfully configured Curxy for this mission https://github.com/ryoppippi/curxy

Hi @henry2man can you please elaborate on how you got it to work on Windows? I'm trying to use it with Groq Cloud and also Ollama (locally)

henry2man commented 2 months ago

Hi @henry2man can you please elaborate on how you got it to work on Windows? I'm trying to use it with Groq Cloud and also Ollama (locally

Hi! I actually don’t use Windows, so I can’t share direct experience with that. However, I did notice that to get it working with CURXY and Ollama on MacOS, it’s crucial to enable the OpenAI custom API Key. Once that's done, you can override the OpenAI Base URL by adding a valid API key. Said that, it's a matter of time follow CURXY Readme instructions...

Mateleo commented 2 months ago

I did it for Windows using Ollama:

🚨 Note: Localhost is not working at the moment, so you’ll need to use a tunneling method. For this example, I used Ngrok.

Install Ngrok, register on the website, and validate your auth token here.
Open a shell and use this command (from the Ollama doc):
ngrok http 11434 --host-header="localhost:11434"
Use the forwarding link in "Override OpenAI Base URL".

🔑 We need an OpenAI API key to force Cursor to use our custom URL:

Generate a fake key here and copy-paste it in the "OpenAI API Key" section.

Finally, disable all other models like so and add your custom model. Voilà! 🎉

hgoona commented 2 months ago

Thanks @Mateleo! I'll give this a shot! BTW - when running Ollama with Cursor, does the "Cursor Prediction" work with custom models like Qwen and Llama 3.1 ?

This feature:

Mateleo commented 1 month ago

@hgoona Yes, and yet it's special... All the CTRL+K and Chat functions use the model, but I haven't noticed any calls to ollama for the 4 Tab features (they do work though). And yet in my settings on the site, I have no calls to external APIs gpt-4o or other.

hhannis commented 1 month ago

+1

hgoona commented 1 month ago

@hgoona Yes, and yet it's special... All the CTRL+K and Chat functions use the model, but I haven't noticed any calls to ollama for the 4 Tab features (they do work though). And yet in my settings on the site, I have no calls to external APIs gpt-4o or other.

Wait - what?! Can I confirm: Does the Cursor Tab Prediction feature work even for Free users + Ollama ??

Mine does not. ??

takingurstuff commented 1 month ago

+1

tcsenpai commented 1 month ago

Thanks, I was looking forward to something like this! I will give it a try too, checking if autocomplete works as well.

tcsenpai commented 1 month ago

UPDATE: I consistently get 403 errors both with LAN and public addresses. The curl command proposed by Cursor works in a terminal tho.

Also, this happens if I still try to execute a query:

And I don't get any request to ollama.

Very weird.

omutas commented 1 month ago

@tcsenpai I had the same issue. Any solution?

she11sh0cked commented 1 month ago

Seems like a CORS issue. I never got it to work... Maybe you can disable web-security / CORS somehow?

tcsenpai commented 1 month ago

@tcsenpai I had the same issue. Any solution?

Nope, I ended up trying another extension and the results were not fantastic, so I gave up completely until we get something like llama3.1 for coding.

apalabrados commented 1 month ago

@she11sh0cked Of course its CORS issue. For make it working, you just need to enable CORS in the Ollama server. Set the following environment variable: launchctl setenv OLLAMA_ORIGINS "*" and then run ollama server.

It works like a charm!

hgoona commented 1 month ago

@apalabrados does the Free tier of Cursor have access to the "Cursor (tab) Predictions" feature when using Ollama LLMs (plus your CORS solution) ??

apalabrados commented 1 month ago

@apalabrados does the Free tier of Cursor have access to the "Cursor (tab) Predictions" feature when using Ollama LLMs (plus your CORS solution) ??

Yes! Attached an snapshot. Captura de pantalla 2024-10-04 a las 7 39 10

hgoona commented 1 month ago

@apalabrados that's the "text prediction" that you've shown in your snapshot, right? What I'm talking about is the "Cursor Predictions"

Looks like this where the "Tab" UI button appears in the view:

Does that part also work? In mine, I only see test predictions, but not "Cursor Predictions" with Ollama 🤔Am I doing something wrong??

apalabrados commented 1 month ago

@hgoona Hi again.... Yes, here you are the picture that shows it: Captura de pantalla 2024-10-04 a las 8 08 31

hgoona commented 1 month ago

@apalabrados thanks for confirming! Can i ask what Ollama models are you using for that? I'm not seeing that UI feature on mine... Is it because I'm using groq models?🤔

apalabrados commented 1 month ago

@hgoona The followings: Captura de pantalla 2024-10-04 a las 11 28 17

Also, check if you have these options enabled: Captura de pantalla 2024-10-04 a las 11 29 28

grytsayev commented 1 month ago

Guys, I discovered the secret. Basically, for this to work, you need to trick the OpenAI key verification. So here are the steps:

Activate Turbo 3.5 GPT.
Create a real OpenAI API key.
Paste it into the OpenAI API Key section in Cursor settings.
Reset the base URL to default.
Click Verify and wait until it's verified successfully.
Deactivate all models except llama3.1:latest, for example.
DON'T CHANGE THE API KEY, just change the base URL and click Save.
Enjoy the working model!

Note: This feature may be patched soon, so if you want to continue using it, avoid updating!

psmyrdek commented 1 month ago

Just noticed that passing API key verification is no longer enough - seems like there's different handling of requests that prevents me from hitting local ollama.

srghma commented 3 weeks ago

tried cursor + grok (llama-3.1-70b-versatile) - tab completitions didnt work (official name is https://api-docs.deepseek.com/guides/fim_completion I think)

@apalabrados maybe it worked for You bc free tab completitions were not depleted for You

arthur-yh commented 1 week ago

Can I not use Ngrok to expose api in network? I use localhost:11434, I can curl and test this llm api normally(in terminal). But it not work well in cursor, can anyone offer a solution? Well I can use Ngrok in company network...

local terminal post,work well curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-abcdef1234567890ABCDEFGHIJKLMNOPQRSTUVWXz" -d '{ "messages": [ { "role": "system", "content": "You are a test assistant." }, { "role": "user", "content": "Testing. Just say hi and nothing else." } ], "model": "codellama" }' {"id":"chatcmpl-432","object":"chat.completion","created":1731331175,"model":"codellama","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"\nHi!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":35,"completion_tokens":4,"total_tokens":39}}

marcusziade commented 1 week ago

has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar

ozgrozer commented 1 week ago

has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar

@marcusziade Yeah I tried the 32B on my MB M4 Max. It’s around the 12 tok/sec. Here’s a video on X that I’m running it with LM Studio and Ngrok on Cursor.

marcusziade commented 1 week ago

has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar

@marcusziade Yeah I tried the 32B on my MB M4 Max. It’s around the 12 tok/sec. Here’s a video on X that I’m running it with LM Studio and Ngrok on Cursor.

yeah, I saw that. I'm gonna give LM Studio a try. I'm on arch with 7900xtx

loktar00 commented 1 week ago

https://github.com/user-attachments/assets/b692ad9f-e894-4150-b771-8e3a3d20007c

Runs really well on 2x3090's Used ngrok as well to get it working didn't need anything else.

astr0gator commented 1 week ago

Hi team, qwen 2.5-coder 32B is here and it's rad. Can anybody give hopes re implementing of ollama support? Frankly, it's game-changing/deal-breaking for many including me. I found 7 open issues about implementing local llm in this repo

Thanks! 🙏🙏🙏

loktar00 commented 6 days ago

Hi team, qwen 2.5-coder 32B is here and it's rad. Can anybody give hopes re implementing of ollama support? Frankly, it's game-changing/deal-breaking for many including me. I found 7 open issues about implementing local llm in this repo

Thanks! 🙏🙏🙏

It works fine, just use ngrok.

getcursor / cursor

Ollama (or other OpenAI API compatible local LLMs) support #1380