Open micheleoletti opened 7 months ago
I also ran into this just now, is there a way to make it work?
+1
Same here :/
Ollama Phi3 in the ⌘ K
🚀
Takes some effort but it's fast and works well!
@kcolemangt looks amazing! what did you put in the settings to do that? I can't get it to work 🤔
Had to write a custom router and point Cursor’s OpenAI Base URL
to that. Thinking of releasing it—want to try?
I see... yeah I'd be interested in trying that out! On what hardware are you running ollama by the way?
released llm-router. lmk how it works for you @micheleoletti
Cursor does not like when you specify a port in the "Override OpenAI Base URL" field.
If you serve ollama on the default http (80
) port, it starts working:
OLLAMA_ORIGINS=* OLLAMA_HOST=127.0.0.1:80 (sudo -E) ollama serve
Then you can put http://localhost/v1
under "Override OpenAI Base URL" in Cursor.
UPD: For some reason, Cursor is trying to hit /v1/models
endpoint which is not implemented in Ollama:
[GIN] 2024/05/30 - 11:45:40 | 404 | 200.375µs | 127.0.0.1 | GET "/v1/models"
That causes this error message:
UPD 2: Even after creating a compatible /v1/models
endpoint in Ollama, Cursor still refuses to work:
~ curl -k http://127.0.0.1/v1/models
{"data":[{"created":1715857935,"id":"llama3:latest","object":"model","owned_by":"organization-owner"}],"object":"list"}
[GIN] 2024/05/30 - 15:55:17 | 200 | 1.221584ms | 127.0.0.1 | GET "/v1/models"
In Cursor's Dev Tools I get:
ConnectError: [not_found] Model llama3:latest not found
Tried both llama3
and llama3:latest
.
Seems like Cursor has something hardcoded for localhost
/127.0.0.1
address.
there is the repo https://github.com/ryoppippi/proxy-worker-for-cursor-with-ollama which is an option (but not for offline use). The readme also points out there is direct communication to cursor server, which is likely why cursor haven't enabled an easy to use option for true local LLM.
+1
I'm getting an error when using a custom URL even if the command works on the terminal
+1
released llm-router. lmk how it works for you @micheleoletti
Does this work on Windows? I'm wondering how to install?
I've successfully configured Curxy for this mission https://github.com/ryoppippi/curxy
I've successfully configured Curxy for this mission https://github.com/ryoppippi/curxy
Hi @henry2man can you please elaborate on how you got it to work on Windows? I'm trying to use it with Groq Cloud and also Ollama (locally)
Hi @henry2man can you please elaborate on how you got it to work on Windows? I'm trying to use it with Groq Cloud and also Ollama (locally
Hi! I actually don’t use Windows, so I can’t share direct experience with that. However, I did notice that to get it working with CURXY and Ollama on MacOS, it’s crucial to enable the OpenAI custom API Key. Once that's done, you can override the OpenAI Base URL by adding a valid API key. Said that, it's a matter of time follow CURXY Readme instructions...
I did it for Windows using Ollama:
🚨 Note: Localhost is not working at the moment, so you’ll need to use a tunneling method. For this example, I used Ngrok.
ngrok http 11434 --host-header="localhost:11434"
🔑 We need an OpenAI API key to force Cursor to use our custom URL:
Finally, disable all other models like so and add your custom model. Voilà! 🎉
Thanks @Mateleo! I'll give this a shot! BTW - when running Ollama with Cursor, does the "Cursor Prediction" work with custom models like Qwen and Llama 3.1 ?
This feature:
@hgoona Yes, and yet it's special... All the CTRL+K and Chat functions use the model, but I haven't noticed any calls to ollama for the 4 Tab features (they do work though). And yet in my settings on the site, I have no calls to external APIs gpt-4o or other.
+1
@hgoona Yes, and yet it's special... All the CTRL+K and Chat functions use the model, but I haven't noticed any calls to ollama for the 4 Tab features (they do work though). And yet in my settings on the site, I have no calls to external APIs gpt-4o or other.
Wait - what?! Can I confirm: Does the Cursor Tab Prediction feature work even for Free users + Ollama ??
Mine does not. ??
+1
Thanks, I was looking forward to something like this! I will give it a try too, checking if autocomplete works as well.
UPDATE: I consistently get 403 errors both with LAN and public addresses. The curl command proposed by Cursor works in a terminal tho.
Also, this happens if I still try to execute a query:
And I don't get any request to ollama.
Very weird.
@tcsenpai I had the same issue. Any solution?
Seems like a CORS issue. I never got it to work... Maybe you can disable web-security / CORS somehow?
@tcsenpai I had the same issue. Any solution?
Nope, I ended up trying another extension and the results were not fantastic, so I gave up completely until we get something like llama3.1 for coding.
@she11sh0cked Of course its CORS issue.
For make it working, you just need to enable CORS in the Ollama server.
Set the following environment variable:
launchctl setenv OLLAMA_ORIGINS "*"
and then run ollama server.
It works like a charm!
@apalabrados does the Free tier of Cursor have access to the "Cursor (tab) Predictions" feature when using Ollama LLMs (plus your CORS solution) ??
@apalabrados does the Free tier of Cursor have access to the "Cursor (tab) Predictions" feature when using Ollama LLMs (plus your CORS solution) ??
Yes! Attached an snapshot.
@apalabrados that's the "text prediction" that you've shown in your snapshot, right? What I'm talking about is the "Cursor Predictions"
Looks like this where the "Tab" UI button appears in the view:
Does that part also work? In mine, I only see test predictions, but not "Cursor Predictions" with Ollama 🤔Am I doing something wrong??
@hgoona Hi again.... Yes, here you are the picture that shows it:
@apalabrados thanks for confirming! Can i ask what Ollama models are you using for that? I'm not seeing that UI feature on mine... Is it because I'm using groq models?🤔
@hgoona The followings:
Also, check if you have these options enabled:
Guys, I discovered the secret. Basically, for this to work, you need to trick the OpenAI key verification. So here are the steps:
llama3.1:latest
, for example.Note: This feature may be patched soon, so if you want to continue using it, avoid updating!
Just noticed that passing API key verification is no longer enough - seems like there's different handling of requests that prevents me from hitting local ollama.
tried cursor + grok (llama-3.1-70b-versatile) - tab completitions didnt work (official name is https://api-docs.deepseek.com/guides/fim_completion I think)
@apalabrados maybe it worked for You bc free tab completitions
were not depleted for You
P.S. related https://github.com/continuedev/continue/issues/2742
Can I not use Ngrok to expose api in network? I use localhost:11434, I can curl and test this llm api normally(in terminal). But it not work well in cursor, can anyone offer a solution? Well I can use Ngrok in company network...
local terminal post,work well
curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-abcdef1234567890ABCDEFGHIJKLMNOPQRSTUVWXz" -d '{ "messages": [ { "role": "system", "content": "You are a test assistant." }, { "role": "user", "content": "Testing. Just say hi and nothing else." } ], "model": "codellama" }' {"id":"chatcmpl-432","object":"chat.completion","created":1731331175,"model":"codellama","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"\nHi!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":35,"completion_tokens":4,"total_tokens":39}}
has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar
has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar
@marcusziade Yeah I tried the 32B on my MB M4 Max. It’s around the 12 tok/sec. Here’s a video on X that I’m running it with LM Studio and Ngrok on Cursor.
has anyone got the new Qwen models running locally in cursor? curious about performance on a 4090 or similar
@marcusziade Yeah I tried the 32B on my MB M4 Max. It’s around the 12 tok/sec. Here’s a video on X that I’m running it with LM Studio and Ngrok on Cursor.
yeah, I saw that. I'm gonna give LM Studio a try. I'm on arch with 7900xtx
https://github.com/user-attachments/assets/b692ad9f-e894-4150-b771-8e3a3d20007c
Runs really well on 2x3090's Used ngrok as well to get it working didn't need anything else.
Hi team, qwen 2.5-coder 32B is here and it's rad. Can anybody give hopes re implementing of ollama support? Frankly, it's game-changing/deal-breaking for many including me. I found 7 open issues about implementing local llm in this repo
Thanks! 🙏🙏🙏
Hi team, qwen 2.5-coder 32B is here and it's rad. Can anybody give hopes re implementing of ollama support? Frankly, it's game-changing/deal-breaking for many including me. I found 7 open issues about implementing local llm in this repo
Thanks! 🙏🙏🙏
It works fine, just use ngrok.
Knowing that Ollama server supports OpenAI API(https://ollama.com/blog/openai-compatibility), the goal is to point Cursor to query the local Ollama server.
My setup is pretty simple:
I added a
llama2
model, set"ollama"
as API key(not used but needed apparently), and overridden the base URL to point to localhost.But it does not work:
If I try to verify the API key it seems like it cannot reach localhost:
But if I try the provided test snippet in the terminal, it works correctly:
So it seems like Cursor internal service is not able to perform the fetch to localhost.
Is there something conceptually wrong with my plan and its implementation? Did anybody manage to make this configuration work?