google-research / rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
https://agarwl.github.io/rliable
Apache License 2.0
747 stars 46 forks source link

Add support for loading data from pandas dataframe #18

Open agarwl opened 1 year ago

agarwl commented 1 year ago

Right now, we only support loading data from numpy arrays. It would be nice if there was a helper function to convert a dataframe of scores to numpy arrays. Some initial code to help what this might look like:


def get_all_return_values(df):
  games = list(df['game'].unique())
  return_vals = {}
  for game in games:
    game_df = df[df['game'] == game]
    arr = game_df.groupby('wid')['normalized_score'].apply(list).values
    return_vals[game] = np.stack(arr, axis=0)
  return return_vals

def convert_to_matrix(x):
  return np.stack([x[k] for k in sorted(x.keys())], axis=1)

## Usage
# Array of shape (num_runs, num_games, num_steps)`
all_normalized_scores = convert_to_matrix(get_all_return_values(score_df))

The above code assumes we have a pandas Dataframe with keys run_number, 'gameandnormalized_score` containing scores for all steps (in a ordered manner).

stefanbschneider commented 1 year ago

Hi, just to better understand the assumed structure of the DataFrame: We have one row, for each step? Are these all the steps during evaluation (not training) on all the tasks?

And we'd assume separate DataFrames for each approach, which are each read separately by get_all_return_values()? Eg, to construct the required dict for computing performance profiles.

agarwl commented 1 year ago

Yeah, for performance profiles, the data frames contain per-step results from evaluation (obtained during the course of training).

For aggregate metrics, we use the final performance, so that corresponds to evaluation results at the final step or a pre-specified step.