Handle incorrect responses from OpenAI in _extract_token method for stream:True

azachar commented 1 year ago

Hello,

In the _extract_token method of the ChatGPTInvocationLayer class, there's an assumption that the event_data["choices"] array will always have at least one item with a delta key. However, based on observed behavior, OpenAI might provide responses without this expected structure on some preview endpoints. This can lead to unexpected errors.

Steps To Reproduce:

Use an OpenAI endpoint that might return a response without the expected structure (e.g., my chunks looks like this:


data: {"id":"","object":"","created":0,"model":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}

data: {"id":"chatcmpl-XXXX","object":"chat.completion.chunk","created":1692979866,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"},"content_filter_results":{}}],"usage":null}

data: {"id":"chatcmpl-YYY","object":"chat.completion.chunk","created":1692979866,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":"stop","delta":{},"content_filter_results":{}}],"usage":null}

data: [DONE]


2. Process the response using the `ChatGPTInvocationLayer` class.
3. Observe errors due to the missing expected structure.

**Expected behavior:**  
The `_extract_token` method should gracefully handle scenarios where `event_data["choices"]` doesn't follow the expected structure. If there's no data in a chunk or if the structure is different, the method should return an appropriate response or error message without crashing.

**Code reference:**
```python
def _extract_token(self, event_data: Dict[str, Any]):
    delta = event_data["choices"][0]["delta"]
    if "content" in delta:
        return delta["content"]
    return None

Link to the line of _extract_token

Solution suggestion:
Consider adding checks to ensure event_data["choices"] has the expected structure before accessing its items. On encountering a structure that deviates from the norm, the method could either provide a warning log and return a None value or manage the deviation in another user-friendly manner.

Looking forward to hearing your thoughts and insights on this!

Thank you for response!

Best regards, Andrej

masci commented 1 year ago

Hi @azachar and thanks for the detailed report!

I can't prioritize this work right now, but I think the solution you suggest is a good one, trapping a KeyError and returning None with a message around the format was not what we expected.

I'm putting this up for contributions wanted!

ashutuptiwari commented 1 year ago

@masci I'd like to work on this issue. And just to be clear this is a simple case of modifying the function to check that the event_data has a certain structure and in the case it doesn't , just returning none and logging a warning?

adarsh-jha-dev commented 1 year ago

Hey @masci , I would like to work on this issue, could you please assign this to me?

deepset-ai / haystack

Handle incorrect responses from OpenAI in _extract_token method for stream:True #5632