e-p-armstrong / augmentoolkit

Convert Compute And Books Into Instruct-Tuning Datasets! Makes: QA, RP, Classifiers.
MIT License
980 stars 135 forks source link

USE_FILENAMES option is broken #68

Open panterarocks49 opened 1 week ago

panterarocks49 commented 1 week ago

Hi @e-p-armstrong, first of all thank you for making this library!

I've found a couple of issues when trying to use the file names option

The first was a "FAILED TO GENERATE QUESTIONS!" error, which I submitted a PR to fix https://github.com/e-p-armstrong/augmentoolkit/pull/67

The next issue I'm running into is in the ContextRepairer pipeline step. The AI is judging that every q/a pair is bad. Here is one of the error outputs

2024-10-11 17:11:38,389 - ERROR - Above prompt resulted in error, probably the model's fault: error in judgement extraction (ans relevancy)
Traceback (most recent call last):
  File "/Users/josh/code/pg-bot/augmentoolkit/augmentoolkit/generation_functions/generation_step_class.py", line 145, in generate
    ret = self.output_processor(response)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/code/pg-bot/augmentoolkit/original/steps.py", line 238, in extract_reasoning_from_context_check
    raise Exception("error in judgement extraction (ans relevancy)")
Exception: error in judgement extraction (ans relevancy)

The end of the prompt which produced that error (truncated because the rest of the prompt can be found in check_qatuples_context_filenames.yaml)

- content: 'Text details: {textname}

    Note that while you have access to this information, for the sake of rewording
    questions, you should evaluate the question as if you could not see this.

    Question: What does the author mean by "managing up" in the context of experienced
    C-level executives?

    Answer: The author implies that "managing up" refers to the ability of experienced
    C-level executives to effectively navigate and influence the higher levels of
    the company''s organizational hierarchy.'
  role: user

Response:
-----

## Reasoning and thought process:
### Question Context Validation
#### Special Term Context Check: Specifically check for use of the terms "book", "text", "passage", and "excerpt" without context about which specific thing is being discussed. The question does not misuse any specific terms without proper context.
#### Text and Author Specificity: The question does not require a reference to a specific text or author as it is asking about general knowledge.
#### Scope and Precision: The question is precise in asking about the meaning of "managing up" in a specific context.

### Answer Context Validation:
#### Special Term Context Check: Specifically check for use of the terms "book", "text", "passage", and "excerpt" without context about which specific thing is being discussed. The answer does not use vague terms without context.
#### Specificity and Clarity: The answer is clear, providing a direct response to the question based on general knowledge.
#### Answer-Only Context Issues: The answer does not introduce any vague or unspecified external material and is valid as it stands.

### Critical Evaluation and Final Judgment:
#### Evaluation: Both the question and answer are precise and do not require additional context for understanding.
#### Final judgment: Pass.

One thing that I noticed is the the {textname} is not filled in, do you think that could be causing this issue?

Other thing is that the no filenames prompt uses all caps PASS instead of Pass, maybe that is it?

I will try to dig into this more when I get the time

e-p-armstrong commented 1 week ago

Hey, thanks very much for bringing this up. This is on me, I have not updated USE_FILENAMES consistently as the prompts have been overhauled over time. What's likely happening is a mismatch between output formats or something else with old prompts (the use_filenames prompts) and new code. I'll see if I can put together a good PR for this soon.

The all caps is probably the root cause, that entire setting needs to be generally brought into the 21st century though so I'll do a pass on it.