TODO: add tests for the counting function.
IMO we're doing too much work with trying to remedy the overflow (Where to cut from? Why cut from the beginning and not the end or vice-versa? Cutting is task specific.)
I would fail "gracefully" i.e. announce the user it overflowed, nothing will be sent to OpenAI if val > self.max_context_tokens, and ask them to reduce the prompt.
Previously the naming in the mixin was confusing; max_tokens represents the response tokens, not context tokens!
… completion; refactoring
TODO: add tests for the counting function. IMO we're doing too much work with trying to remedy the overflow (Where to cut from? Why cut from the beginning and not the end or vice-versa? Cutting is task specific.) I would fail "gracefully" i.e. announce the user it overflowed, nothing will be sent to OpenAI if
val > self.max_context_tokens
, and ask them to reduce the prompt. Previously the naming in the mixin was confusing; max_tokens represents the response tokens, not context tokens!