eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.62k stars 195 forks source link

SSH port forwarding to websocket application #147

Open Bipasha-banerjee opened 1 year ago

Bipasha-banerjee commented 1 year ago

Hi, I am using the directions given at https://docs.lmql.ai/en/latest/language/hf.html to run a local transformer model on a remote server. However, ssh port forwarding results in an error "No WebSocket UPGRADE hdr: None Can "Upgrade" only to "WebSocket". Wondering if anyone faced the same issue and how to resolve it.

lbeurerkellner commented 1 year ago

I typically run serve-model remotely with ssh -L 8080:localhost:8080 which should do the trick. Can you share more information about your SSH setup. Are there possibly limitations on the type of traffic allowed via 8080 or something is already listening to it. Maybe you could try running with a different --port?

Bipasha-banerjee commented 1 year ago

Thanks for the response. I have used different ports using the --port command. Unfortunately, the same result. I have used port forwarding with this remote server before. I understand that the server is spinning it on a WebSocket [Serving LMTP endpoint on ws://localhost:8001/]. I have never used port forwarding on a websocket before and was wondering if there is any special connection config that needs to be added/modified.

lbeurerkellner commented 1 year ago

Coming back to this, on a second look it appears that you are directly visiting localhost:8001. Could you clarify how you are trying to make use of the remote model. In general, you will need a local playground (lmql playground) and the remotely hosted model (lmql serve-model). The local playground will then connect to your remote model via the forwarded port, but the playground itself will still be served on localhost:3000.

Please let me, so we can find a solution to this :)

Bipasha-banerjee commented 1 year ago

Thanks for the response. I understand the flow for the most part, but I am stuck with how the local playground connects to the remote model. The local is served on localhost:3000, and the remote lmql serve is on port 8001. I used argmax "Hello[WHO]" from lmql.model("modelname", endpoint="localhost:8001") command, but I get a compilation error.

lbeurerkellner commented 1 year ago

I see, yes. The problem here is that we currently do not support a from clause when you omit the decoder keyword. So instead, just write:

argmax "Hello[WHO]" from lmql.model("modelname", endpoint="localhost:8001")

This should do the trick. Let me know.

Bipasha-banerjee commented 1 year ago

Hi,

Thanks for the suggestion. I am still facing compile error. I am writing down the exact steps I do so that you can correct me if I am missing anything.

  1. I have an env with lmcl installed in both my local and remote machines. My remote machine has the model and GPU to serve it.
  2. I run lmql serve-model model_name --cuda --port 8001 (after ssh port forwarding)
  3. I run the local playground using lmql playground command
  4. I use argmax "Hello[WHO]" from lmql.model("model_name", endpoint="localhost:8001")
  5. I am getting compilation error with cannot access repo; could not load tokenizer etc. From what I understand, it is trying to find the model on the local machine and not listening on the port that is provided as the endpoint.

Hope all this made sense. Thanks again for helping me with this. Appreciate it.

PS: I've tried running playground on the remote as well and accessing it via ssh tunneling. The run button is disabled in this case. Regards, Bipasha