chrisrude / oobabot

A Discord bot which talks to Large Language Model AIs running on oobabooga's text-generation-webui
MIT License
98 stars 33 forks source link

Generation is very slow #47

Closed Urammar closed 1 year ago

Urammar commented 1 year ago

Generation of the reply in discord is very slow compared to ordinary webui generation. Is this a bug, perhaps?

Rate of generation

tokens: 11, time: 13.77s, latency: 13.39s, rate: 0.80 tok/s←[0m

vs

Output generated in 4.70 seconds (2.13 tokens/s, 10 tokens, context 82, seed 1774856757)

jmoney7823956789378 commented 1 year ago

It should normalize under longer inferencing periods, like if you asked it to tell you a long story. Unfortunately the "warmup time" involved when using oobabot is likely unavoidable.

Urammar commented 1 year ago

Gotcha, cheers

chrisrude commented 1 year ago

@Urammar I'm curious how you're measuring the local comparison speed. Is it using the same prompt as Oobabot is sending it?

If you want to see the full prompt, you can run oobabot with --log-all-the-things, and the full prompt text (and response) will be printed to stdout. You can then copy/paste from there into the Oobabooga UI. Please let me know if it's much different.