Evaluation World IDs - Githubissues

paischer101 commented 1 year ago

The world id's that are being downloaded in your evaluation script

https://github.com/Avalon-Benchmark/avalon/blob/71a9310b25d91af2c8bb9e4cdc1bcc5393ff7163/notebooks/avalon_results.sync.py#L58

do not match the world id's you provide for your downloadable worlds here:

https://github.com/Avalon-Benchmark/avalon/blob/main/docs/avalon_baselines.md#reproducing-our-paper-results

They substantially differ, e.g. the the id's from your evaluation script are 6 digits ('542584') vs 12 digits for the downloaded worlds ('201108062007'). Further the downloadable set contains 1024 worlds, but there are only 1000 worlds listed in the scores downloaded by the evaluation script.

It seems that either the id's have changed at some point, or the downloadable set of worlds is not equal to the ones used for your evaluation. Maybe I am missing something here?

brydenfogelman commented 1 year ago

Hi @paischer101, thanks for flagging the issue! We did two rounds of data collection and it's highly likely some links are referencing to data collected in the previous round. To the best of my knowledge these should be the links that point to the data used in the paper. We'll need to do some testing on our end to make sure!

All human data (includes observations, actions, human inputs and metadata)
Human worlds
Agent worlds
Human scores

paischer101 commented 1 year ago

Hi @brydenfogelman, thank you for the response! It seems that the links to the evaluation worlds in the readme are not up-to-date:

https://github.com/Avalon-Benchmark/avalon#resources https://github.com/Avalon-Benchmark/avalon/blob/main/docs/avalon_baselines.md#reproducing-our-paper-results

Both above links download evaluation worlds with 12-digit ids, while the links you provided in your answer contain 6-digit ids, which is in line with the ids that are downloaded from the evaluation notebook. I will re-run the evaluation on the worlds you provided, and will close this issue in case everything works out.

paischer101 commented 1 year ago

Evaluation works with the worlds you provided, closing this issue.

Avalon-Benchmark / avalon

Evaluation World IDs #27