Streaming doesn't work with some emulated OpenAI backends

Madd0g commented 11 months ago

Describe the bug

I noticed some OpenAI API emulators contain \r\n\r\n instead of \n\n in streaming messages. which breaks the parsing of the chunks.

To Reproduce

happens with https://github.com/oobabooga/text-generation-webui when started with the --api option

then hitting /v1/chat/completions with streaming turned on

Code snippets

No response

OS

macOS

Elixir version

1.15.7

Library version

0.4.2

restlessronin commented 11 months ago

@Madd0g thanks for reporting this problem. I think I did the initial implementation based on articles online and they only mentioned '\n' as line terminator. I never found the original spec.

After your report, I went looking for the SSE spec again (UPDATE: I eventually found the latest spec in a seemingly marginally related page). Looking at the spec, I can see that end of line can be '\n', '\r' or '\r\n' so the protocol is still being correctly implemented by these proxies. The text stream is not identical to what OpenAI would produce, but a correctly functioning SSE client should also work on these streams.

Ideally the implementation of SSE should be getting done at the level of Finch / Mint. This isn't something that should be getting independently re-implemented by various JSON API wrapper library authors.

In the short run it might turn out to be necessary to fix this issue if a lot of people want to use streaming chat completions with non OpenAI models. In the long run, the fix should be merged at the Finch / Mint level.

What non OpenAI models are you using? Is it possible to run those models with one of the other model runners (there seem to be a bunch of run times for alternative LLMs at the moment), hopefully one that produces the same '\n' end-of-line as OpenAI?

If this is a must-have feature, it's worth fixing it in the library in the short term.

Madd0g commented 11 months ago

I'm using https://github.com/oobabooga/text-generation-webui

FWIW, I didn't have this problem with litellm that was a proxy for ollama, but I'm trying the oobabooga which has its own openai emulation.

Ideally the implementation of SSE should be getting done at the level of Finch / Mint. This isn't something that should be getting independently re-implemented by various JSON API wrapper library authors.

I only mentioned it because I saw String.split("\n\n") in the code and when I changed it to \r\n\r\n is started working.

I actually saw further issues, where it sends a chunk that's not prefixed by data: which the lib also crashes on. I patched it all locally, I'm not sure it's worth fixing if they're going so far off-spec.

restlessronin commented 11 months ago

I'm using https://github.com/oobabooga/text-generation-webui

FWIW, I didn't have this problem with litellm that was a proxy for ollama, but I'm trying the oobabooga which has its own openai emulation.

I'm guessing each of these is using a different SSE library, some of which use the same end-of-line as OpenAI and others which don't.

I only mentioned it because I saw String.split("\n\n") in the code and when I changed it to \r\n\r\n is started working.

@Madd0g I appreciate you taking the trouble to identify the precise nature of the problem. Definitely worth bringing up.

I actually saw further issues, where it sends a chunk that's not prefixed by data: which the lib also crashes on. I patched it all locally, I'm not sure it's worth fixing if they're going so far off-spec.

I would hate to lose the work that you have done. Could you submit a PR with your patches, and I'll accept it. Depending on what turns out to be the best fix, I might make further changes as well (or revert some changes).

I put in this feature because I foresee wanting to work with some local models, but I am personally not using it as yet. If you (and others) are actually finding this helpful, I would definitely like to address it in some fashion.

Madd0g commented 11 months ago

I'm a total noob with elixir and I do not usually commit code from this account, so I don't have my ssh keys handy.

But to get over how ugly my first solution was, I took the time to enhance the solution today with the help of an AI friend, so I'll paste it here, I don't think it's terrible.

defp tokenize_data(evt_data, acc) do
  combined_data = acc <> evt_data

  if contains_terminator?(combined_data) do
    {remaining, token_chunks} = split_on_terminators(combined_data)

    tokens =
      token_chunks
      |> Enum.map(&extract_token/1)
      |> Enum.filter(fn %{data: data} -> data != "[DONE]" end)
      |> Enum.map(fn %{data: data} -> %{data: Jason.decode!(data)} end)

    {tokens, remaining}
  else
    {[], combined_data}
  end
end

defp contains_terminator?(data) do
  String.contains?(data, "\n\n") or String.contains?(data, "\r\n\r\n")
end

defp split_on_terminators(data) do
  data
  |> String.replace("\r\n\r\n", "\n\n")
  |> String.split("\n\n")
  |> List.pop_at(-1)
end

# and here just added a default handler for when it doesn't come in as `data:`
defp extract_token(line) do
  [field | rest] = String.split(line, ": ", parts: 2)

  case field do
    "data" -> %{data: Enum.join(rest, "") |> String.replace_prefix(" ", "")}
    _ -> %{data: ""}
  end
end

restlessronin commented 11 months ago

@Madd0g at first glance, the code looks quite good. I should be able to use most of it.

Having found the latest spec for the stream format, it appears that the empty field (no 'data') corresponds to a comment, which has to be ignored.

The other fields (event, id and retry) should perhaps be handled as well. Since I didn't have the spec when I did the first implementation, I had deliberately not put in a default branch (so that fast-failure would tell me what the missing fields were). Since i never saw a failure with OpenAI, I didn't look at it again. Now would be a good time to add those in.

In the spirit of recognizing contribution, I would like to add your username as the author of the commit. Would that be OK with you? Is @Madd0g the right account to acknowledge, or is there a better one?

Madd0g commented 11 months ago

yes, you can add my username, thanks :)

restlessronin commented 11 months ago

@Madd0g I think I'll need an email for the commit, even if it's the gh-generated "anonymous" email.

Madd0g commented 11 months ago

@restlessronin - I know what you mean, I've seen those emails before, but not sure how to generate it myself. I tried to search but couldn't figure out how to do it.

restlessronin commented 11 months ago

@Madd0g I think you need to have a valid verified email, and they generate the anonymized one automatically if you choose to keep your verified email private.

Madd0g commented 11 months ago

1171003+Madd0g@users.noreply.github.com

this probably?

restlessronin commented 11 months ago

I have published v0.5.1 to hexpm with the SSE fixes.

@Madd0g it would be great if you could confirm that it's working for your use cases.

I am closing this issue as completed.

Madd0g commented 11 months ago

sorry for the delay, I will confirm, just ultra busy, will try to get to it before the weekend

Madd0g commented 10 months ago

haven't touched elixir for a few weeks, almost forgot how to use it

I upgraded to latest version. It didn't work out of the box, had to change some code around to look more like the example code on hexdocs, but after that everything worked perfectly!

restlessronin commented 10 months ago

Thanks for testing @Madd0g! 🙏🏾

Yeah. I had to change the API to allow streaming request cancellation. I hope that's working on the local LLM's as well. Any chance you can confirm that?

Madd0g commented 10 months ago

yes, tested - works well

cyberchitta / openai_ex