If the response exceeds the token limit, which is set in constants.py, the response is simply truncated and cannot be parsed as a JSON. Currently there is no mechanism for identifying and dealing with truncated responses. I have the token limit set to its maximum (8192), but this doesn't guarantee that this problem will be avoided. Users may wish to reduce this value to limit costs, which will make truncated responses more likely.
To do:
[ ] Modify the system prompt to request an unambiguous symbol to be added at the beginning and end of the response. These can be removed before parsing the JSON. But their absence would signal that the response is truncated
[ ] If the response is truncated, automatically send a message back to the API stating that the response was truncated, requesting another, more concise response, and including the current token limit for reference.
[ ] Post a message to the chat saying something like "I'm sorry, my first response didn't fit into the current token limit of {token_limit}. I'm going to try being more concise. You might also consider increasing the token limit in the settings".
[ ] Limit the number of retries using the MAX_RETRIES variable in constants. Come up with a message for the chat that explains that we've reached the maximum number of retries and couldn't come up with a response that fit within the token limit. The user could then ask a new question.
If the response exceeds the token limit, which is set in constants.py, the response is simply truncated and cannot be parsed as a JSON. Currently there is no mechanism for identifying and dealing with truncated responses. I have the token limit set to its maximum (8192), but this doesn't guarantee that this problem will be avoided. Users may wish to reduce this value to limit costs, which will make truncated responses more likely.
To do: