Follow-up answers are slow

I have a CPU only setup, so my system is quite slow. I notice that llamafile is much slower than gpt4all for follow-up answers.

e.g. I ask (using Dolphin 2.7 Mixtral 8x7b with its lengthy system message): A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

First prompt evaluation with llamafile is about the same as for gpt4all (about 80s)

But when I reply to the answer telling that it is wrong, llamafile (60s) takes about as long for prompt processing as for the first answer, while gpt4all (6s) answers almost immediately.

Mozilla-Ocho / llamafile

Follow-up answers are slow #408