eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.6k stars 194 forks source link

Running non-reloading hosted version of playground #10

Open ajbouh opened 1 year ago

ajbouh commented 1 year ago

Hi, I've gotten the model serving endpoint up on in a private instance, but I'm having difficulty starting a simple playground UI that can interact with it.

Am I missing anything obvious about which entry points or config to use?

Would I be better off using the pyiodide-based UI like the public playground uses?

lbeurerkellner commented 1 year ago

Thanks for reaching out. The title suggest, that you have issues with the auto reloading nature of the playground. Is this the case? Is the issue non-responsiveness due to limited resources?

It may be worth noting, that model serving lmql serve-model is a separate process from the playground lmql playground. You have to run both, to use local models with a local instance of the playground.

ajbouh commented 1 year ago

I have two issues:

  1. I don't think I want to use a hot reloading backend. In hosted mode I'm not changing any implementation code worth reloading

  2. I want to register a different address for the model serving backend than just "localhost:8080"

lbeurerkellner commented 1 year ago

Thanks, these suggestions seem reasonable.

My plan for this issue would then be:

This way, language users will benefit by default, and we can still offer language developers full hot-reloading in a "dev" playground. For now, running the playground without hot-reloading is a bit tedious, so I will see that I get too this soon.

ajbouh commented 1 year ago

Thanks so much for talking all this out with me.

Can you help me understand the exact generated files I need? I can of course share a patch with the final scripts I get working!

I would prefer to do a build first and then package up only the files I need to run the playground.

If you can point me to the specific files and line numbers with the core functionality, that might be enough for me to figure out the rest.

Along these lines, why do we need both the debug UI and the live UI running on separate ports?

lbeurerkellner commented 1 year ago

If you want to know how a standalone playground distribution (for in-browser use) can be packaged up, have a look at https://github.com/eth-sri/lmql/blob/main/web/deploy.sh. If you don't set REACT_APP_WEB_BUILD=1, then the resulting playground build will use a local live server at localhost:3000.

The live UI is served via react (port 3000) and the communication with LMQL is done via a separate process based on https://github.com/eth-sri/lmql/blob/main/src/lmql/ui/live/live.py (port 3004). On 3004, we actually serve a hot-reloading version of LMQL, which makes this very useful, if you are actively changing LMQL internals.

It would be interesting to implement a fully packaged playground, where neither UI nor LMQL are hot-reloading. This would eliminate startup time, as the LMQL interpreter can be kept alive across different queries. This is done in the browser-packaged playground on http://lmql.ai/playground, where you will observe faster startup times. However, in the browser distribution, the tokenizer is slower, so there would definitely be an upside to a fully-packaged, local playground.

I think the next step after this, should be either an Electron-based "LMQL Studio" application, packaging both the interpreter and playground, or an extended VSCode extension, that also integrates all playground functionality.