Output from multiple sessions gets jumbled together

Infinoid commented 1 month ago

If you have two session tabs where the AI is responding to two prompts at the same time, their outputs get jumbled together.

I think Hollama has two HTTP connections open, receiving chunked-encoding responses from each, but those responses are both added to whichever session tab is currently selected (active) in the UI, not the session tab that the prompt came from.

To reproduce:

Open a new session. Ask a short question, like "Does this chat work?", to create the session tab in the session list. Wait for the answer to complete.
Open a second session in the same way. Wait for the answer to complete.
Type a longer question in the first session tab.
While it is being answered, type a longer question in the second session tab.

For example, I had two llama3.1:8b sessions open, and asked each of them to generate a short story at the same time, one about a grasshopper and one about a carpenter. As the responses were generated, I saw jumbled output in the second session tab that looks like this:

adow filled with vibrant wildflowersHere and's buzzing a bees short. story His for days you were:

spent---

leKapingait fromo blade Yam toato blade was, just sn anatching average tasty guy insects from for modern lunch-day, Tokyo and, b workingasking as in a the carp warmenter sunlight in.

After the two stories finished generating:

The first session tab has no AI text boxes in it, just my prompt with no response.
The second session tab had two AI text boxes in it, which seemed to contain two copies of the same jumbled text.

I suspect that if I had selected the first session tab, the two text boxes would have ended up in that tab, instead.

Infinoid commented 1 month ago

Here are some other things I checked:

It doesn't seem to matter which model is used; their outputs are all jumbled in the same way.
I am able to send two prompts to my ollama service at the same time using curl, and the responses look just fine.
In Chrome Devtools, I can see two /api/generate requests running in parallel, and their responses both look fine.

In both curl and chrome devtools, if I ignore all of the JSON and just look at the "response" fields, each story is readable. So I think it's a problem with the UI applying responses to the wrong session tab.

fmaclen commented 1 month ago

@Infinoid thanks for the bug report.

I'm currently working on a PR to fix another issue that I think would also address this one (though maybe partially).

In fact, can you try it out? https://api-chat-endpoint.hollama.pages.dev/

The implementation in that environment is not 100% finished but I just did a quick test with 2 completions running simultaneously in 2 tabs and they didn't get jumbled 🤞

What I did notice is that the Ollama's GPU usage remained high after the completions on both sessions were over, and it coincidentally went down after I closed the tabs.

I also noticed that the completion on the 2nd tab wasn't saved to localStorage when I closed it, which would indicate it probably got overwritten by the 1st completion process.

Infinoid commented 1 month ago

I should mention that this bug is much more visible when using a slow model, for instance a large model that doesn't fit into GPU memory. If it takes 15 or more seconds to generate the full response, it should be easy to reproduce this issue.

Infinoid commented 1 month ago

In fact, can you try it out? https://api-chat-endpoint.hollama.pages.dev/

I tried it, and it behaves a bit differently.

I used gemma2:27b for this one, because it's nice and slow.

I asked one session to write a story about a frog. Then I asked a second session to write a story about a stick insect.

It's still running... but what I see right now is that the first session doesn't have a response, and the second session looks like it started in the middle of a frog story. I don't see two interleaved responses, but one of the responses is truncated and appears in the wrong tab, while the other response isn't visible at all.

Infinoid commented 1 month ago

Upon completion, I see two responses in the second session tab:

a truncated frog story
a truncated frog story, with a full stick insect story immediately after it.

Here's the point where the first story ended and the second story started:

After all, there were countless worlds to explore, countless stories waiting to be told, and Ferdinand, the Emerald Prophet frog, was just getting started.Bartholomew "Bart" Branchington had always prided himself on his camouflage.

Searching through both session tabs, I only see my second (stick insect) prompt; my first (frog) prompt has vanished.

fmaclen commented 1 month ago

Understood, thanks for the clarification. I will need to investigate this further.

If you don't mind, can you tell me which version of Ollama are you running? ollama -v The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature: https://github.com/ollama/ollama/releases/tag/v0.2.0

Infinoid commented 1 month ago

Understood, thanks for the clarification. I will need to investigate this further.

No problem. Good luck with it!

If you don't mind, can you tell me which version of Ollama are you running? ollama -v The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature:

Good to know. I've only started using it very recently, so I was unfamiliar with the history.

% podman exec ollama ollama -v
ollama version is 0.2.8

fmaclen commented 1 month ago

The I'm no longer able to replicate the issue with the "jumbled responses" in https://hollama.fernando.is now that #125 is merged.

Here's a video of 2 simultaneous completions using gemma2:27b with the prompts:

Could you write an isekai very short story where the protagonist is a STICK INSECT?
Could you write an isekai very short story where the protagonist is a FROG?

https://github.com/user-attachments/assets/c4290822-79a0-43ae-b264-3e6c418c13a6

The completion was speed up 8x during video editing.

That being said, there is another bug present (visible in the video) in which the session finishes last will override the first one during saveSession(): https://github.com/fmaclen/hollama/issues/127

Infinoid commented 1 month ago

To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two. So when I said "session tab", that's what I was referring to.

I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?

fmaclen commented 1 month ago

To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two.

Understood, thanks for the clarification. I was worried I overlooked a detail in your initial report, that's why I didn't close the issue 😅

Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).

Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point. That being said, the app should:

Disable the "New session" button while a completion is in progress.
Or abort the current completion if "New session" is clicked.

I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?

No, it's the same instance, unless you open the 2nd tab in Incognito mode.

Infinoid commented 1 month ago

Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).

Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point. That being said, the app should:

Disable the "New session" button while a completion is in progress.

Or abort the current completion if "New session" is clicked.

If two sessions already existed, you can flip between them and submit new queries in both; that's where the confusion happens. So maybe make those other sessions unclickable, too.

I think this approach is workable for now. As long as the user doesn't expect it to work, they won't complain when it doesn't. :)

Thanks for confirming that the issue is real, glad I'm not going crazy!

fmaclen / hollama

Output from multiple sessions gets jumbled together #126