Closed crazybmanp closed 11 months ago
Hmm, are you referring to the chat endpoint or the completion endpoint? I modified some of the logic in the chat endpoint to support auto-retry when the context length is too long (which maybe has a bug?), but I haven't touched the logic for the completion endpoint in a very long time. What code specifically are you running?
Sorry for not specifying; this is for the chat endpoint.
Good catch. My retry logic for streaming chats had a bug. I've fixed it in in v1.9.
Following the example code on the readme leads to two calls of the completion endpoint when you run the async stream methods. It only seems to get one token back before starting a new generation and sending the chat request again.
This is unfortunately breaking a third party endpoint i use, however i have verified the problem against the official API using the usage screen on OpenAI.