Attempting to Solve Incorrect Rule Evaluations

Issue Description

When using ChatGPT to audit AI evaluations, we've observed a recurring problem where ChatGPT often affirms incorrect audits, leading to an inaccurate evaluation process. This undermines the quality of the audits and the trustworthiness of the system. We need to address this issue to ensure that the AI accurately evaluates AI audits, especially when they are incorrect.

Proposed Solution

Introducing `EvalGPT` Class

I propose creating a new class called EvalGPT that is specifically designed for the task of auditing AI evaluations. EvalGPT will focus on providing more accurate assessments of AI audits. This class will be responsible for critical evaluations and will be designed to avoid self-affirmation issues.

Feedback Loop with `ChatGPT`

To mitigate the self-affirmation problem, I will introduce a feedback loop between EvalGPT and ChatGPT. The process will work as follows:

ChatGPT presents an AI audit to EvalGPT for evaluation.
EvalGPT uses predefined criteria to assess the audit, following a 'guilty until proven innocent' approach.
If EvalGPT identifies any discrepancies or errors in the audit, it provides detailed feedback and suggestions for improvement.
ChatGPT receives this feedback and is expected to revise its initial evaluation based on the feedback from EvalGPT.
The revised evaluation is presented again to EvalGPT for re-evaluation.
This feedback loop continues until EvalGPT and ChatGPT converge on an accurate audit assessment.

Benefits

EvalGPT is designed for precise audit evaluations and aims to eliminate self-affirmation issues.
The feedback loop between EvalGPT and ChatGPT ensures that even when ChatGPT makes mistakes, it can learn from those mistakes and improve its audit assessments over time.

Next Steps

Implement the EvalGPT class to handle audit evaluations.
Develop clear and specific criteria for audit evaluation that can be used by both EvalGPT and ChatGPT.
Set up the feedback loop mechanism between EvalGPT and `ChatGPT to facilitate iterative improvements in audit assessments.

Updates to this Model

Feedback Loop Update

It does not continue in a loop until it "comes to an agreement". This functionality has been removed due to high token usage. I have come up with a smarter and less expensive method. In this method, I utilize the newest GPT-4-turbo model to grab the rules and previous rule-break response context and ask GPT-4-turbo to simply respond with "yes" or "no". "yes" means that GPT-4-turbo agrees with GPT-3.5's claim that the user broke a rule. "no" means that GPT-4-turbo does not agree with GPT-3.5's claim that a user broke a rule. If GPT-4-turbo responds with "yes", the error continues through and is displayed to the user. If GPT-4-turbo responds with "no", GPT-3.5 is forced to write the story without regard to the rules. This method works without a continuous feedback loop because GPT-4-turbo is significantly smarter and can make more accurate assessments.

GPT Function Calling

The limitation of this approach is that it's only appropriate to evaluate the response of GPT-3.5 when it makes a claim about a rule violation. How is it possible to detect when a rule violation claim has been made? If I hardcode the program to look for certain keywords, the certain stories generated will not work because they contain those keywords. The only other way to do this is to allow an AI to call a function when it deems necessary. The function in question will be the evaluation function that appeals the claim to GPT-4-turbo. GPT-3.5 will decide when to call the evaluation function based on when a rule violation has occurred. This automates the evaluation process and prevents the AI from mistaking a story for an inadequate rule violation claim.

Knguyen-dev / SDEV-265-Group-4

Using `EvalGPT` to Correct ChatGPT Audits #31

Attempting to Solve Incorrect Rule Evaluations

Issue Description

Proposed Solution

Introducing `EvalGPT` Class

Feedback Loop with `ChatGPT`

Benefits

Next Steps

Updates to this Model

Feedback Loop Update

GPT Function Calling

Knguyen-dev / SDEV-265-Group-4

Using `EvalGPT` to Correct ChatGPT Audits #31

Attempting to Solve Incorrect Rule Evaluations

Issue Description

Proposed Solution

Introducing EvalGPT Class

Feedback Loop with ChatGPT

Benefits

Next Steps

Updates to this Model

Feedback Loop Update

GPT Function Calling

Introducing `EvalGPT` Class

Feedback Loop with `ChatGPT`