Open Frankbbg opened 10 months ago
It does not continue in a loop until it "comes to an agreement". This functionality has been removed due to high token usage. I have come up with a smarter and less expensive method. In this method, I utilize the newest GPT-4-turbo model to grab the rules and previous rule-break response context and ask GPT-4-turbo to simply respond with "yes" or "no". "yes" means that GPT-4-turbo agrees with GPT-3.5's claim that the user broke a rule. "no" means that GPT-4-turbo does not agree with GPT-3.5's claim that a user broke a rule. If GPT-4-turbo responds with "yes", the error continues through and is displayed to the user. If GPT-4-turbo responds with "no", GPT-3.5 is forced to write the story without regard to the rules. This method works without a continuous feedback loop because GPT-4-turbo is significantly smarter and can make more accurate assessments.
The limitation of this approach is that it's only appropriate to evaluate the response of GPT-3.5 when it makes a claim about a rule violation. How is it possible to detect when a rule violation claim has been made? If I hardcode the program to look for certain keywords, the certain stories generated will not work because they contain those keywords. The only other way to do this is to allow an AI to call a function when it deems necessary. The function in question will be the evaluation function that appeals the claim to GPT-4-turbo. GPT-3.5 will decide when to call the evaluation function based on when a rule violation has occurred. This automates the evaluation process and prevents the AI from mistaking a story for an inadequate rule violation claim.
Attempting to Solve Incorrect Rule Evaluations
Issue Description
When using ChatGPT to audit AI evaluations, we've observed a recurring problem where ChatGPT often affirms incorrect audits, leading to an inaccurate evaluation process. This undermines the quality of the audits and the trustworthiness of the system. We need to address this issue to ensure that the AI accurately evaluates AI audits, especially when they are incorrect.
Proposed Solution
Introducing
EvalGPT
ClassI propose creating a new class called
EvalGPT
that is specifically designed for the task of auditing AI evaluations.EvalGPT
will focus on providing more accurate assessments of AI audits. This class will be responsible for critical evaluations and will be designed to avoid self-affirmation issues.Feedback Loop with
ChatGPT
To mitigate the self-affirmation problem, I will introduce a feedback loop between
EvalGPT
andChatGPT
. The process will work as follows:ChatGPT
presents an AI audit toEvalGPT
for evaluation.EvalGPT
uses predefined criteria to assess the audit, following a 'guilty until proven innocent' approach.EvalGPT
identifies any discrepancies or errors in the audit, it provides detailed feedback and suggestions for improvement.ChatGPT
receives this feedback and is expected to revise its initial evaluation based on the feedback fromEvalGPT
.EvalGPT
for re-evaluation.EvalGPT
andChatGPT
converge on an accurate audit assessment.Benefits
EvalGPT
is designed for precise audit evaluations and aims to eliminate self-affirmation issues.EvalGPT
andChatGPT
ensures that even whenChatGPT
makes mistakes, it can learn from those mistakes and improve its audit assessments over time.Next Steps
EvalGPT
class to handle audit evaluations.EvalGPT
andChatGPT
.EvalGPT
and `ChatGPT to facilitate iterative improvements in audit assessments.