biobootloader / wolverine

MIT License
5.21k stars 725 forks source link

Planning for smart utilization of 3.5 #6

Open CiberNin opened 1 year ago

CiberNin commented 1 year ago

GPT 3.5 is so much cheaper (.002 vs .06 / k tokens), not to mention it it usually returns faster and is less throttled. Given that, it makes sense to always at least attempt to use GPT 3.5 first.

Given we are gonna try GPT 3.5 first, how do we determine when to fallback to GPT 4? 1) When compiling the prompt for our completion, if it leaves less than n tokens remaining for completion, where n is the smallest number we expect to possibly hold an expected completion. IMO 500 tokens is a reasonable amount to reserve. But that's a variable that could use empirical measurement. 2) When receiving the completion, we should prompt for the answer to be wrapped in some delimiters to detect if there were not enough tokens for GPT 3.5 to complete it's attempted answer. 3) When detecting if the completion resulted in a fix for the current error. It may however be worth retrying here while slowly ratcheting up temperature. Or feeding new error back in. Need a way to check if GPT is just introducing new errors that happen before the original error could happen.

Additionally, code should include future proofing for fallback to the 32K model using rules 1 & 2 (since it's not smarter, just bigger). Obviously disabled by flag. Similarly allow disabling of 4-8k using the same system.

biobootloader commented 1 year ago

Good ideas! I'm done limited experimentation with using 3.5-turbo with wolverine (it's now added as a flag). It sometimes works but sometimes fails to return valid json. A quick optimization might be to try a few iterations with 3.5, and if it returns invalid json automatically retry with 4.