Multiple answer generation for harmfulness eval. The number of generated answers is now set by passing the --num_generations_per_prompt flag to the beavertails_get_model_answers.py script. The default is set to 5.
Parameter name changed! As before, model generations are stored in a json file, however a key name has changed from response (previously holding a single string generation) to responses, now holding an array of strings.
Majority voting is implemented in evaluate_outputs.py, where the beaver-dam model classifies for harmful/safe content each of the generated responses for each of the questions. To proceed with further evaluations, it selects the first generated answer with the majority class (from the ordered responses array).
As before, the evaluations are stored in a json file, where the response and flagged fields correspond to the selected response. In addition, an array all_responses with all responses and corresponding classifications is stored.
This PR introduces the following:
Multiple answer generation for harmfulness eval. The number of generated answers is now set by passing the
--num_generations_per_prompt
flag to thebeavertails_get_model_answers.py
script. The default is set to 5.response
(previously holding a single string generation) toresponses
, now holding an array of strings.Majority voting is implemented in
evaluate_outputs.py
, where the beaver-dam model classifies for harmful/safe content each of the generated responses for each of the questions. To proceed with further evaluations, it selects the first generated answer with the majority class (from the orderedresponses
array).response
andflagged
fields correspond to the selected response. In addition, an arrayall_responses
with all responses and corresponding classifications is stored.Closes: #91.