GPT4 gets special treatment in the script, for historical reasons (at least that's me explanation; as the main API key didn't have access to it at the time). This isn't necessary, and should be consolidated so that running the benchmark really is just one call.
GPT4 gets special treatment in the script, for historical reasons (at least that's me explanation; as the main API key didn't have access to it at the time). This isn't necessary, and should be consolidated so that running the benchmark really is just one call.