Closed atisharma closed 12 months ago
Why not just use the ooba api, or Kobold? All the backend stuff goes away.
It does use ooba API, and can talk to multiple instances. But it does not know when a server is busy or not.
Oh! Do you have install instructions somewhere?
https://github.com/atisharma/chasm_engine#installing-and-running Detailed instructions depend on your setup. But that discussion doesn't belong in this issue, please open a new one.
Added replicate support; also can use cloud OpenAI-compatible providers.
Using tgwui is OK, but it may be better to send to a pool of LLM servers running, for instance, https://mlc.ai/mlc-llm/ and communicate over zmq.