Closed Infinoid closed 3 weeks ago
Here are some other things I checked:
ollama
service at the same time using curl
, and the responses look just fine./api/generate
requests running in parallel, and their responses both look fine.In both curl and chrome devtools, if I ignore all of the JSON and just look at the "response" fields, each story is readable. So I think it's a problem with the UI applying responses to the wrong session tab.
@Infinoid thanks for the bug report.
I'm currently working on a PR to fix another issue that I think would also address this one (though maybe partially).
In fact, can you try it out? https://api-chat-endpoint.hollama.pages.dev/
The implementation in that environment is not 100% finished but I just did a quick test with 2 completions running simultaneously in 2 tabs and they didn't get jumbled 🤞
What I did notice is that the Ollama's GPU usage remained high after the completions on both sessions were over, and it coincidentally went down after I closed the tabs.
I also noticed that the completion on the 2nd tab wasn't saved to localStorage
when I closed it, which would indicate it probably got overwritten by the 1st completion process.
I should mention that this bug is much more visible when using a slow model, for instance a large model that doesn't fit into GPU memory. If it takes 15 or more seconds to generate the full response, it should be easy to reproduce this issue.
In fact, can you try it out? https://api-chat-endpoint.hollama.pages.dev/
I tried it, and it behaves a bit differently.
I used gemma2:27b
for this one, because it's nice and slow.
I asked one session to write a story about a frog. Then I asked a second session to write a story about a stick insect.
It's still running... but what I see right now is that the first session doesn't have a response, and the second session looks like it started in the middle of a frog story. I don't see two interleaved responses, but one of the responses is truncated and appears in the wrong tab, while the other response isn't visible at all.
Upon completion, I see two responses in the second session tab:
Here's the point where the first story ended and the second story started:
After all, there were countless worlds to explore, countless stories waiting to be told, and Ferdinand, the Emerald Prophet frog, was just getting started.Bartholomew "Bart" Branchington had always prided himself on his camouflage.
Searching through both session tabs, I only see my second (stick insect) prompt; my first (frog) prompt has vanished.
Understood, thanks for the clarification. I will need to investigate this further.
If you don't mind, can you tell me which version of Ollama are you running? ollama -v
The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature: https://github.com/ollama/ollama/releases/tag/v0.2.0
Understood, thanks for the clarification. I will need to investigate this further.
No problem. Good luck with it!
If you don't mind, can you tell me which version of Ollama are you running?
ollama -v
The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature:
Good to know. I've only started using it very recently, so I was unfamiliar with the history.
% podman exec ollama ollama -v
ollama version is 0.2.8
The I'm no longer able to replicate the issue with the "jumbled responses" in https://hollama.fernando.is now that #125 is merged.
Here's a video of 2 simultaneous completions using gemma2:27b
with the prompts:
Could you write an isekai very short story where the protagonist is a STICK INSECT?
Could you write an isekai very short story where the protagonist is a FROG?
https://github.com/user-attachments/assets/c4290822-79a0-43ae-b264-3e6c418c13a6
The completion was speed up 8x during video editing.
That being said, there is another bug present (visible in the video) in which the session finishes last will override the first one during saveSession()
: https://github.com/fmaclen/hollama/issues/127
To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two. So when I said "session tab", that's what I was referring to.
I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?
To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two.
Understood, thanks for the clarification. I was worried I overlooked a detail in your initial report, that's why I didn't close the issue 😅
Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).
Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point. That being said, the app should:
I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?
No, it's the same instance, unless you open the 2nd tab in Incognito mode.
Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).
Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point. That being said, the app should:
- Disable the "New session" button while a completion is in progress.
- Or abort the current completion if "New session" is clicked.
If two sessions already existed, you can flip between them and submit new queries in both; that's where the confusion happens. So maybe make those other sessions unclickable, too.
I think this approach is workable for now. As long as the user doesn't expect it to work, they won't complain when it doesn't. :)
Thanks for confirming that the issue is real, glad I'm not going crazy!
If you have two session tabs where the AI is responding to two prompts at the same time, their outputs get jumbled together.
I think Hollama has two HTTP connections open, receiving chunked-encoding responses from each, but those responses are both added to whichever session tab is currently selected (active) in the UI, not the session tab that the prompt came from.
To reproduce:
For example, I had two
llama3.1:8b
sessions open, and asked each of them to generate a short story at the same time, one about a grasshopper and one about a carpenter. As the responses were generated, I saw jumbled output in the second session tab that looks like this:After the two stories finished generating:
I suspect that if I had selected the first session tab, the two text boxes would have ended up in that tab, instead.