Problem 1
Currently, we define the schema for what gets written to the saved output file (generated by util.save_dataset) in the EvalOutputRecord class. This class has attributes like model_input, sent_less_output, etc. that correspond to columns in the Ray Dataset produced by an EvalAlgorithm's evaluate method.
However, the column names in the produced Ray Dataset are governed by the *_COLUMN_NAME constants in constants.py. Thus, we need to ensure parity between these column name constants and the attributes of EvalOutputRecord. Specifically, this code in EvalOutputRecord.from_row requires that the attribute names in EvalOutputRecord exactly match their corresponding *_COLUMN_NAME strings.
Currently, there is no mechanism for ensuring such parity. There is simply a comment in the docstring of the EvalOutputRecord class. Even if the comment is acknowledged by an engineer, there is still room for manual error, like typos.
Problem 2
Currently, EvalOutputRecord looks to the set COLUMN_NAMES as the source of truth regarding Ray Dataset column names, but not every *_COLUMN_NAME constant is included in this set. Since there's nothing enforcing that we include every new *_COLUMN_NAME constant in COLUMN_NAMES, it is very easy to accidentally skip the step of updating COLUMN_NAMES.
This problem has already resulted in a bug. In this PR, the attributes prompt, sent_more_prompt, and sent_less_prompt were added to EvalOutputRecord. While the corresponding constants SENT_MORE_PROMPT_COLUMN_NAME = "sent_more_prompt" and SENT_LESS_PROMPT_COLUMN_NAME = "sent_less_prompt" exist in constants.py (note that PROMPT_COLUMN_NAME = "prompt" is currently being defined in each eval algo's code, which I have changed in this PR), they aren't included in COLUMN_NAMES, which means that this snippet will not "pick up" the prompt, sent_more_prompt, and sent_less_prompt columns when constructing the EvalOutputRecord object.
Proposed changes
Fix for Problem 1
I propose that we get rid of the attributes in EvalOutputRecord that correspond to the non-score columns (ex: model_input, sent_less_output, etc), and instead use a single attribute, non_score_columns, which is aDict, to encode the same information content. The keys to this dict will be validated in EvalOutputRecord's __post_init__ method by comparing them to constants.COLUMN_NAMES.
Fix for Problem 2
I have added a new Enum, ColumnNames, which encapsulates all of the *_COLUMN_NAME constants. The set COLUMN_NAMES is now defined automatically, using the values in ColumnNames. Because COLUMN_NAMES is created from ColumnNames, we will never run into the issue where we add a new *_COLUMN_NAME constant but COLUMN_NAMES doesn't get updated accordingly.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes:
Current state of affairs
Problem 1 Currently, we define the schema for what gets written to the saved output file (generated by
util.save_dataset
) in theEvalOutputRecord
class. This class has attributes likemodel_input
,sent_less_output
, etc. that correspond to columns in the Ray Dataset produced by an EvalAlgorithm'sevaluate
method.However, the column names in the produced Ray Dataset are governed by the
*_COLUMN_NAME
constants inconstants.py
. Thus, we need to ensure parity between these column name constants and the attributes ofEvalOutputRecord
. Specifically, this code inEvalOutputRecord.from_row
requires that the attribute names inEvalOutputRecord
exactly match their corresponding*_COLUMN_NAME
strings.Currently, there is no mechanism for ensuring such parity. There is simply a comment in the docstring of the
EvalOutputRecord
class. Even if the comment is acknowledged by an engineer, there is still room for manual error, like typos.Problem 2 Currently,
EvalOutputRecord
looks to the setCOLUMN_NAMES
as the source of truth regarding Ray Dataset column names, but not every*_COLUMN_NAME
constant is included in this set. Since there's nothing enforcing that we include every new*_COLUMN_NAME
constant inCOLUMN_NAMES
, it is very easy to accidentally skip the step of updatingCOLUMN_NAMES
.This problem has already resulted in a bug. In this PR, the attributes
prompt
,sent_more_prompt
, andsent_less_prompt
were added toEvalOutputRecord
. While the corresponding constantsSENT_MORE_PROMPT_COLUMN_NAME = "sent_more_prompt"
andSENT_LESS_PROMPT_COLUMN_NAME = "sent_less_prompt"
exist inconstants.py
(note thatPROMPT_COLUMN_NAME = "prompt"
is currently being defined in each eval algo's code, which I have changed in this PR), they aren't included inCOLUMN_NAMES
, which means that this snippet will not "pick up" the prompt, sent_more_prompt, and sent_less_prompt columns when constructing theEvalOutputRecord
object.Proposed changes
Fix for Problem 1 I propose that we get rid of the attributes in
EvalOutputRecord
that correspond to the non-score columns (ex:model_input
,sent_less_output
, etc), and instead use a single attribute,non_score_columns
, which is aDict
, to encode the same information content. The keys to this dict will be validated inEvalOutputRecord
's__post_init__
method by comparing them toconstants.COLUMN_NAMES
.Fix for Problem 2 I have added a new Enum,
ColumnNames
, which encapsulates all of the*_COLUMN_NAME
constants. The setCOLUMN_NAMES
is now defined automatically, using the values inColumnNames
. BecauseCOLUMN_NAMES
is created fromColumnNames
, we will never run into the issue where we add a new*_COLUMN_NAME
constant butCOLUMN_NAMES
doesn't get updated accordingly.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.