Open agarwl opened 1 year ago
Hi, just to better understand the assumed structure of the DataFrame: We have one row, for each step? Are these all the steps during evaluation (not training) on all the tasks?
And we'd assume separate DataFrames for each approach, which are each read separately by get_all_return_values()
?
Eg, to construct the required dict for computing performance profiles.
Yeah, for performance profiles, the data frames contain per-step results from evaluation (obtained during the course of training).
For aggregate metrics, we use the final performance, so that corresponds to evaluation results at the final step or a pre-specified step.
Right now, we only support loading data from numpy arrays. It would be nice if there was a helper function to convert a dataframe of scores to numpy arrays. Some initial code to help what this might look like:
The above code assumes we have a
pandas
Dataframe with keysrun_number
, 'gameand
normalized_score` containing scores for all steps (in a ordered manner).