corl-team / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
https://corl-team.github.io/CORL
Apache License 2.0
477 stars 20 forks source link

[Question] Is there a description on the testing methodology and the meaning of the scores in the performance tables ? #13

Closed jamartinh closed 11 months ago

jamartinh commented 11 months ago

Hi, in the github project main page, there are several tables with performance results comparing several algorithms.

Is there documentation from where these numbers como and why they mean ? As well as the methodology used ?

Thanks a lot !

vkurenkov commented 11 months ago

Hi,

Did you check https://github.com/corl-team/CORL/tree/main/results ?

jamartinh commented 11 months ago

Thanks @vkurenkov for the link, however I guess I will need a little to start,

The results are the average rewards obtained?

For instance, let's take the Table of Gym-Mujoco:

https://github.com/corl-team/CORL#gym-mujoco

Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT

hopper-medium-v2 | 53.51 ± 1.76 | 55.48 ± 7.30 | 60.37 ± 3.49 | 63.02 ± 4.56 | 59.08 ± 3.77 | 67.53 ± 3.78 | 102.29 ± 0.17 | 40.82 ± 9.91 | 101.70 ± 0.28 | 65.10 ± 1.61

This numbers seems to me very low for meaning Hopper rewards.

Could you please give me some aid ?

Lots of thanks!

vkurenkov commented 11 months ago

Yes, sure. The results are averaged across seeds (best = best evaluation point within runs; final = the last ones within runs).

Note that these are normalized scores not the actual reward provided by the environment.

normalized_score = 100* (score - REF_MIN_SCORE) / (REF_MAX_SCORE - REF_MIN_SCORE )

For more details check section on normalized scores here https://github.com/Farama-Foundation/D4RL

jamartinh commented 11 months ago

Many thanks !