Knguyen-dev / SDEV-265-Group-4

Shared repo for sdev 265 group
0 stars 1 forks source link

Using `EvalGPT` to Correct ChatGPT Audits #31

Open Frankbbg opened 10 months ago

Frankbbg commented 10 months ago

Attempting to Solve Incorrect Rule Evaluations

Issue Description

When using ChatGPT to audit AI evaluations, we've observed a recurring problem where ChatGPT often affirms incorrect audits, leading to an inaccurate evaluation process. This undermines the quality of the audits and the trustworthiness of the system. We need to address this issue to ensure that the AI accurately evaluates AI audits, especially when they are incorrect.

Proposed Solution

Introducing EvalGPT Class

I propose creating a new class called EvalGPT that is specifically designed for the task of auditing AI evaluations. EvalGPT will focus on providing more accurate assessments of AI audits. This class will be responsible for critical evaluations and will be designed to avoid self-affirmation issues.

Feedback Loop with ChatGPT

To mitigate the self-affirmation problem, I will introduce a feedback loop between EvalGPT and ChatGPT. The process will work as follows:

  1. ChatGPT presents an AI audit to EvalGPT for evaluation.
  2. EvalGPT uses predefined criteria to assess the audit, following a 'guilty until proven innocent' approach.
  3. If EvalGPT identifies any discrepancies or errors in the audit, it provides detailed feedback and suggestions for improvement.
  4. ChatGPT receives this feedback and is expected to revise its initial evaluation based on the feedback from EvalGPT.
  5. The revised evaluation is presented again to EvalGPT for re-evaluation.
  6. This feedback loop continues until EvalGPT and ChatGPT converge on an accurate audit assessment.

Benefits

Next Steps

Frankbbg commented 10 months ago

Updates to this Model

Feedback Loop Update

It does not continue in a loop until it "comes to an agreement". This functionality has been removed due to high token usage. I have come up with a smarter and less expensive method. In this method, I utilize the newest GPT-4-turbo model to grab the rules and previous rule-break response context and ask GPT-4-turbo to simply respond with "yes" or "no". "yes" means that GPT-4-turbo agrees with GPT-3.5's claim that the user broke a rule. "no" means that GPT-4-turbo does not agree with GPT-3.5's claim that a user broke a rule. If GPT-4-turbo responds with "yes", the error continues through and is displayed to the user. If GPT-4-turbo responds with "no", GPT-3.5 is forced to write the story without regard to the rules. This method works without a continuous feedback loop because GPT-4-turbo is significantly smarter and can make more accurate assessments.

GPT Function Calling

The limitation of this approach is that it's only appropriate to evaluate the response of GPT-3.5 when it makes a claim about a rule violation. How is it possible to detect when a rule violation claim has been made? If I hardcode the program to look for certain keywords, the certain stories generated will not work because they contain those keywords. The only other way to do this is to allow an AI to call a function when it deems necessary. The function in question will be the evaluation function that appeals the claim to GPT-4-turbo. GPT-3.5 will decide when to call the evaluation function based on when a rule violation has occurred. This automates the evaluation process and prevents the AI from mistaking a story for an inadequate rule violation claim.