Closed restlessronin closed 5 months ago
@aramallo if you get a chance, could you check if this fixes all your problems? It should be on main shortly.
Thanks @restlessronin I'll give it a try this week and let you know.
Hi @restlessronin unfortunately the error is still back.
The following example uses a local LMMStudio instance.
iex(1)> config = %OpenaiEx{
...(1)> token: "lmstudio",
...(1)> organization: nil,
...(1)> beta: "assistants=v1",
...(1)> base_url: "http://localhost:1234/v1",
...(1)> receive_timeout: 15000,
...(1)> finch_name: OpenaiEx.Finch,
...(1)> _ep_path_mapping: &OpenaiEx._identity/1,
...(1)> _http_headers: [
...(1)> {"Authorization", "Bearer lmstudio"},
...(1)> {"OpenAI-Beta", "assistants=v1"}
...(1)> ]
...(1)> }
%OpenaiEx{
token: "lmstudio",
organization: nil,
beta: "assistants=v1",
base_url: "http://localhost:1234/v1",
receive_timeout: 15000,
stream_timeout: :infinity,
finch_name: OpenaiEx.Finch,
_ep_path_mapping: &OpenaiEx._identity/1,
_http_headers: [
{"Authorization", "Bearer lmstudio"},
{"OpenAI-Beta", "assistants=v1"}
]
}
iex(2)> alias OpenaiEx.Chat
OpenaiEx.Chat
iex(3)> alias OpenaiEx.ChatMessage
OpenaiEx.ChatMessage
iex(4)> alias OpenaiEx.MsgContent
OpenaiEx.MsgContent
iex(5)>
nil
iex(6)> chat_req =
...(6)> Chat.Completions.new(
...(6)> model: "gpt-3.5-turbo",
...(6)> messages: [
...(6)> ChatMessage.user(
...(6)> "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"
...(6)> )
...(6)> ]
...(6)> )
%{
messages: [
%{
role: "user",
content: "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"
}
],
model: "gpt-3.5-turbo"
}
iex(7)> response = OpenaiEx.ChatCompletion.create(config, req, stream: true)
error: undefined variable "req"
└─ iex:7
** (CompileError) cannot compile code (errors have been logged)
iex(7)> response = OpenaiEx.ChatCompletion.create(config, chat_req, stream: true)
** (UndefinedFunctionError) function OpenaiEx.ChatCompletion.create/3 is undefined (module OpenaiEx.ChatCompletion is not available)
OpenaiEx.ChatCompletion.create(%OpenaiEx{token: "lmstudio", organization: nil, beta: "assistants=v1", base_url: "http://localhost:1234/v1", receive_timeout: 15000, stream_timeout: :infinity, finch_name: OpenaiEx.Finch, _ep_path_mapping: &OpenaiEx._identity/1, _http_headers: [{"Authorization", "Bearer lmstudio"}, {"OpenAI-Beta", "assistants=v1"}]}, %{messages: [%{role: "user", content: "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"}], model: "gpt-3.5-turbo"}, [stream: true])
iex:7: (file)
iex(7)> response = OpenaiEx.Chat.Completion.create(config, chat_req, stream: true)
** (UndefinedFunctionError) function OpenaiEx.Chat.Completion.create/3 is undefined (module OpenaiEx.Chat.Completion is not available)
OpenaiEx.Chat.Completion.create(%OpenaiEx{token: "lmstudio", organization: nil, beta: "assistants=v1", base_url: "http://localhost:1234/v1", receive_timeout: 15000, stream_timeout: :infinity, finch_name: OpenaiEx.Finch, _ep_path_mapping: &OpenaiEx._identity/1, _http_headers: [{"Authorization", "Bearer lmstudio"}, {"OpenAI-Beta", "assistants=v1"}]}, %{messages: [%{role: "user", content: "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"}], model: "gpt-3.5-turbo"}, [stream: true])
iex:7: (file)
iex(7)> response = OpenaiEx.Chat.Completions.create(config, chat_req, stream: true)
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
(l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
a
.../openai_ex❯ iex -S mix
Erlang/OTP 26 [erts-14.2.4] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit] [dtrace]
Interactive Elixir (1.16.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> openai = %OpenaiEx{
...(1)> token: "lmstudio",
...(1)> organization: nil,
...(1)> beta: "assistants=v1",
...(1)> base_url: "http://localhost:1234/v1",
...(1)> receive_timeout: 15000,
...(1)> finch_name: OpenaiEx.Finch,
...(1)> _ep_path_mapping: &OpenaiEx._identity/1,
...(1)> _http_headers: [
...(1)> {"Authorization", "Bearer lmstudio"},
...(1)> {"OpenAI-Beta", "assistants=v1"}
...(1)> ]
...(1)> }
%OpenaiEx{
token: "lmstudio",
organization: nil,
beta: "assistants=v1",
base_url: "http://localhost:1234/v1",
receive_timeout: 15000,
stream_timeout: :infinity,
finch_name: OpenaiEx.Finch,
_ep_path_mapping: &OpenaiEx._identity/1,
_http_headers: [
{"Authorization", "Bearer lmstudio"},
{"OpenAI-Beta", "assistants=v1"}
]
}
iex(2)>
nil
iex(3)> chat_req =
...(3)> OpenaiEx.Chat.Completions.new(
...(3)> model: "gpt-3.5-turbo",
...(3)> messages: [
...(3)> OpenaiEx.ChatMessage.user(
...(3)> "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"
...(3)> )
...(3)> ]
...(3)> )
%{
messages: [
%{
role: "user",
content: "Give me some background on the elixir language. Why was it created? What is it used for? What distinguishes it from other languages? How popular is it?"
}
],
model: "gpt-3.5-turbo"
}
iex(4)> response = OpenaiEx.Chat.Completions.create(openai, chat_req, stream: true)
%{
status: 200,
headers: [
{"x-powered-by", "Express"},
{"content-type", "text/event-stream"},
{"cache-control", "no-cache"},
{"connection", "keep-alive"},
{"date", "Wed, 01 May 2024 10:50:24 GMT"},
{"transfer-encoding", "chunked"}
],
body_stream: #Function<52.53678557/2 in Stream.resource/3>,
task_pid: #PID<0.217.0>
}
iex(5)> chat_stream.body_stream |> Stream.flat_map(& &1) |> Enum.each(fn x -> IO.puts(inspect(x)) end)
error: undefined variable "chat_stream"
└─ iex:5
** (CompileError) cannot compile code (errors have been logged)
iex(5)> response.body_stream |> Stream.flat_map(& &1) |> Enum.each(fn x -> IO.puts(inspect(x)) end)
%{data: %{"choices" => [%{"delta" => %{"content" => "The", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{"content" => " Eli", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{"content" => "x", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{"content" => "ir", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{"content" => " programming", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
...
%{data: %{"choices" => [%{"delta" => %{"content" => " applications", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{"content" => ".", "role" => "assistant"}, "finish_reason" => nil, "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
%{data: %{"choices" => [%{"delta" => %{}, "finish_reason" => "stop", "index" => 0}], "created" => 1714560626, "id" => "chatcmpl-lce79304zkd554utjrjwki", "model" => "TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q4_K_M.gguf", "object" => "chat.completion.chunk"}}
** (Jason.DecodeError) unexpected byte at position 0: 0x64 ("d")
(jason 1.4.1) lib/jason.ex:92: Jason.decode!/2
(openai_ex 0.6.0) lib/openai_ex/http_sse.ex:63: OpenaiEx.HttpSse.next_sse/1
(elixir 1.16.2) lib/stream.ex:1626: Stream.do_resource/5
(elixir 1.16.2) lib/stream.ex:943: Stream.do_transform/5
(elixir 1.16.2) lib/enum.ex:4396: Enum.each/2
iex:7: (file)
@aramallo, I think what's happening here is that the server is ending the stream (eof?) after "done: [DONE]" rather than terminating it with two newlines. If that's the case, this is an incorrect implementation according to the spec and should probably be reported to server provider (LMStudio)?
If this is the case, the bug will likely show up on every streaming request. Is that what is happening?
I'd prefer not to accommodate a buggy server implementation, but in the interest of practicality, I've re-inserted your original fix into the main branch. Would really appreciate your confirming that it works.
I still think that the other branch will always be correctly formatted JSON. Let me know if that assumption is incorrect.
@restlessronin Good catch, I will try using OpenAI and another implementation (vLLM) and let you know. But if that is the case I completely agree with your approach.
@aramallo. Thanks. AFAIK, it works with OpenAI and liteLLM, although I fixed one potential corner case in the PR for the current issue (after your initial report got me examining the code and protocol in detail).
But let me know if you see any problems with other local llm proxies.
This is a follow up PR to https://github.com/restlessronin/openai_ex/pull/83.
After I merged that PR, I decided that there had to be a better fix, closer to the point of origin of the message creation. This is that fix.