[Bug]: Titling being done by chat model

jordantgh commented 1 year ago

Contact Details

No response

What happened?

I originally noticed the issue when sending prompts via gpt-4-32k. I could see in my API usage that I was getting doubled charged from this API for every new chat I would start. It was clear that the second charge was due to the use of gpt-4-32k to choose the title. Now I begrudge paying $0.5-1 for a single prompt anyway, but doubling it for the chat title is too much. I originally thought this was intended behaviour as to deal with very large prompts that don't fit in the gpt-3.5-turbo context. (Even so, my preferred behaviour in this instance is to leave the chat untitled if it is too big to fit). However, I noticed that it will use gpt-4-32k for titling even with a small prompt, so I figured something is wrong that it is always using the active chat model.

I chased the bug down to the following source. OpenAIClient.titleConvo() tries to pass gpt-3.5-turbo-0613 (or gpt-3.5-turbo) to setOptions() (via sendPayload()). But setOptions() fails to update the class property appropriately, code here:

const modelOptions = this.options.modelOptions || {};
    if (!this.modelOptions) {
      this.modelOptions = {
        ...modelOptions,
        model: modelOptions.model || 'gpt-3.5-turbo',
        temperature:
          typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature,
        top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p,
        presence_penalty:
          typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOptions.presence_penalty,
        stop: modelOptions.stop,
      };
    }

If this.modelOptions already exists (which it will once any message is sent I assume), it will not be updated. That means the model used for titling cannot differ from the model used for chatting, which seems suboptimal. To fix this I will be submitting a PR with a small change to add an else clause that simply updates the model. This is a bit of hack and doesn't fix the wider issue of letting the user decide what model to use, and for very long gpt-4-32k prompts (there will be an error and the chat will remain untitled in that case). But it will prevent people getting a fright from much larger than expected API costs, which seems most pressing.

Steps to Reproduce

For chatting, choose a model other than those in the gpt-3.5-turbo-* family.
Check API calls and see that the chosen chat model is being double called for every new chat created.

What browsers are you seeing the problem on?

Chrome

Relevant log output

No response

Screenshots

Code of Conduct

[X] I agree to follow this project's Code of Conduct

danny-avila commented 1 year ago

Thanks for reporting this, I'll look into this soon, as it's definitely not intended.

Also, the newer title method using langchain tries to limit the entire title prompt to under 100 characters or less, regardless of the size of the prompt, because it only uses snippets, but I can also update the older method to behave the same way, which is a fallback. It may be that the fallback method is being used here, maybe a symptom of openrouter or the particular model you selected. I also use openrouter so I can test with your exact configuration.

jordantgh commented 1 year ago

Thanks for reporting this, I'll look into this soon, as it's definitely not intended.

Also, the newer title method using langchain tries to limit the entire title prompt to under 100 characters or less, regardless of the size of the prompt, because it only uses snippets, but I can also update the older method to behave the same way, which is a fallback. It may be that the fallback method is being used here, maybe a symptom of openrouter or the particular model you selected. I also use openrouter so I can test with your exact configuration.

No worries. My PR is there if you want to take a look. It is quite a simplistic fix. It's my first ever PR in a repo other than my own, so apologies if it isn't up to scratch.

danny-avila commented 1 year ago

No worries. My PR is there if you want to take a look. It is quite a simplistic fix. It's my first ever PR in a repo other than my own, so apologies if it isn't up to scratch.

Any PR is a good PR in my book! Even when it's not up to scratch its an opportunity to code review and invite conversation. Thanks for doing that. I plan to approve and do the bit of work to minimize token usage further for titling while also making the model configurable and the whole feature optional

danny-avila / LibreChat