aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
155 stars 40 forks source link

fix: create single source of truth for dataset column names #171

Closed danielezhu closed 5 months ago

danielezhu commented 5 months ago

Issue #, if available:

Description of changes:

Current state of affairs

Problem 1 Currently, we define the schema for what gets written to the saved output file (generated by util.save_dataset) in the EvalOutputRecord class. This class has attributes like model_input, sent_less_output, etc. that correspond to columns in the Ray Dataset produced by an EvalAlgorithm's evaluate method.

However, the column names in the produced Ray Dataset are governed by the *_COLUMN_NAME constants in constants.py. Thus, we need to ensure parity between these column name constants and the attributes of EvalOutputRecord. Specifically, this code in EvalOutputRecord.from_row requires that the attribute names in EvalOutputRecord exactly match their corresponding *_COLUMN_NAME strings.

Currently, there is no mechanism for ensuring such parity. There is simply a comment in the docstring of the EvalOutputRecord class. Even if the comment is acknowledged by an engineer, there is still room for manual error, like typos.

Problem 2 Currently, EvalOutputRecord looks to the set COLUMN_NAMES as the source of truth regarding Ray Dataset column names, but not every *_COLUMN_NAME constant is included in this set. Since there's nothing enforcing that we include every new *_COLUMN_NAME constant in COLUMN_NAMES, it is very easy to accidentally skip the step of updating COLUMN_NAMES.

This problem has already resulted in a bug. In this PR, the attributes prompt, sent_more_prompt, and sent_less_prompt were added to EvalOutputRecord. While the corresponding constants SENT_MORE_PROMPT_COLUMN_NAME = "sent_more_prompt" and SENT_LESS_PROMPT_COLUMN_NAME = "sent_less_prompt" exist in constants.py (note that PROMPT_COLUMN_NAME = "prompt" is currently being defined in each eval algo's code, which I have changed in this PR), they aren't included in COLUMN_NAMES, which means that this snippet will not "pick up" the prompt, sent_more_prompt, and sent_less_prompt columns when constructing the EvalOutputRecord object.

Proposed changes

Fix for Problem 1 I propose that we get rid of the attributes in EvalOutputRecord that correspond to the non-score columns (ex: model_input, sent_less_output, etc), and instead use a single attribute, non_score_columns, which is aDict, to encode the same information content. The keys to this dict will be validated in EvalOutputRecord's __post_init__ method by comparing them to constants.COLUMN_NAMES.

Fix for Problem 2 I have added a new Enum, ColumnNames, which encapsulates all of the *_COLUMN_NAME constants. The set COLUMN_NAMES is now defined automatically, using the values in ColumnNames. Because COLUMN_NAMES is created from ColumnNames, we will never run into the issue where we add a new *_COLUMN_NAME constant but COLUMN_NAMES doesn't get updated accordingly.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.