Closed Urammar closed 1 year ago
It should normalize under longer inferencing periods, like if you asked it to tell you a long story. Unfortunately the "warmup time" involved when using oobabot is likely unavoidable.
Gotcha, cheers
@Urammar I'm curious how you're measuring the local comparison speed. Is it using the same prompt as Oobabot is sending it?
If you want to see the full prompt, you can run oobabot with --log-all-the-things
, and the full prompt text (and response) will be printed to stdout. You can then copy/paste from there into the Oobabooga UI. Please let me know if it's much different.
Generation of the reply in discord is very slow compared to ordinary webui generation. Is this a bug, perhaps?
Rate of generation
tokens: 11, time: 13.77s, latency: 13.39s, rate: 0.80 tok/s←[0m
vs
Output generated in 4.70 seconds (2.13 tokens/s, 10 tokens, context 82, seed 1774856757)