google-research / rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
https://agarwl.github.io/rliable
Apache License 2.0
747 stars 46 forks source link

Request for Code Implementation of "Rank Comparison" Figure in Rliable Paper #28

Closed amantuer closed 4 days ago

amantuer commented 5 days ago

The "Rank Comparison" figure that is described in the paper but not provided in the GitHub repository.

In particular, I would like to ask if there is a publicly available code implementation for the creation of the "Rank Comparison" figure, similar to the one presented in the paper (e.g., Figure A.33). I have checked the Rliable repository and documentation but was unable to locate the relevant code.

Would it be possible to share this code or guide me on how to implement this figure using Rliable? This would be incredibly helpful for my research, and I would greatly appreciate any assistance or resources you can provide.

Thank you for your time and for making such a valuable tool available to the research community. I look forward to your response.

agarwl commented 4 days ago
#@title Rank Computation Helpers

def get_rank_matrix(score_dict, n=100000, algorithms=None):
  arr = []
  if algorithms is None:
    algorithms = sorted(score_dict.keys())
  print(f'Using algorithms: {algorithms}')
  for alg in algorithms:
    arr.append(subsample_scores_mat(
        score_dict[alg], num_samples=n, replace=True))
  X = np.stack(arr, axis=0)
  num_algs, _, num_tasks = X.shape
  all_mat = []
  for task in range(num_tasks):
    # Sort based on negative scores as rank 0 corresponds to minimum value,
    # rank 1 corresponds to second minimum value when using lexsort.
    task_x = -X[:, :, task]
    # This is done to randomly break ties.
    rand_x = np.random.random(size=task_x.shape)
    # Last key is the primary key, 
    indices = np.lexsort((rand_x, task_x), axis=0)
    mat = np.zeros((num_algs, num_algs))
    for rank in range(num_algs):
      cnts = collections.Counter(indices[rank])
      mat[:, rank] = np.array([cnts[i]/n for i in range(num_algs)])
    all_mat.append(mat)
  all_mat = np.stack(all_mat, axis=0)
  return all_mat

all_ranks = get_rank_matrix(dmc_score_dict, 200000, algorithms=algs)
#@title Plot aggregate ranks

keys = algs
labels = list(range(1, len(keys)+1))
width = 1.0       # the width of the bars: can also be len(x) sequence

fig, axes = plt.subplots(ncols=2, figsize=(2.9 * 2, 3.6))

for main_idx, main_key in enumerate(['100k', '500k']):
  ax = axes[main_idx]
  mean_ranks = mean_ranks_all[main_key]
  bottom = np.zeros_like(mean_ranks[0])
  for i, key in enumerate(algs):
    label = key if main_idx == 0 else None
    ax.bar(labels, mean_ranks[i], width, label=label, 
          color=DMC_COLOR_DICT[key], bottom=bottom, alpha=0.9)
    bottom += mean_ranks[i]

  if main_idx == 0:
    ax.set_ylabel('Fraction (in %)', size='x-large')
    yticks = np.array(range(0, 101, 20))
    ax.set_yticklabels(yticks, size='large')
  else:
    ax.set_yticklabels([])
  ax.set_yticks(yticks * 0.01)
  ax.set_xlabel('Ranking', size='x-large')
  ax.set_xticks(labels)
  ax.set_xticklabels(labels, size='large')
  ax.set_title(main_key + ' steps', size='x-large', y=0.95)
  ax.spines['top'].set_visible(False)
  ax.spines['right'].set_visible(False)
  ax.spines['bottom'].set_visible(False)
  ax.spines['left'].set_visible(False)
  left = True if main_idx == 0 else False
  ax.tick_params(axis='both', which='both', bottom=False, top=False,
                  left=left, right=False, labeltop=False,
                  labelbottom=True, labelleft=left, labelright=False)

fig.legend(loc='upper center', fancybox=True, ncol=3, fontsize='x-large')
fig.subplots_adjust(top=0.72, wspace=0.03)
plt.show()
agarwl commented 4 days ago

See the open-sourced colab: https://colab.research.google.com/drive/1a0pSD-1tWhMmeJeeoyZM1A-HCW3yf1xR#scrollTo=CJzoQDw3zXtN

amantuer commented 4 days ago

Thanks! That's helped a lot!! @agarwl