HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL
MIT License
275 stars 47 forks source link

Document how to reproduce results & contribution guidelines #16

Closed AdamGleave closed 5 years ago

AdamGleave commented 5 years ago

Add signposts to relevant parts of our codebase, with an eye to people wanting to replicate experiments. Also release some of our pre-trained adversarial policies.

codecov[bot] commented 5 years ago

Codecov Report

Merging #16 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master     #16   +/-   ##
======================================
  Coverage    58.2%   58.2%           
======================================
  Files          57      57           
  Lines        4883    4883           
======================================
  Hits         2842    2842           
  Misses       2041    2041
Flag Coverage Δ
#aprl 10.87% <ø> (ø) :arrow_up:
#modelfree 51.21% <ø> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update afc6134...a873369. Read the comment docs.

AdamGleave commented 5 years ago

Thanks for the comments @decodyng. I agree it'd be good to have someone not involved with developing the project to test-drive the instructions. Perhaps @wingsweihua could provide feedback on this? (This was prompted by an e-mail from him.)

wingsweihua commented 5 years ago

Thanks for the comments @decodyng. I agree it'd be good to have someone not involved with developing the project to test-drive the instructions. Perhaps @wingsweihua could provide feedback on this? (This was prompted by an e-mail from him.)

Thanks for the doc update. I didn' run on docker but found the training works fine in general, with two small issues I have:

  1. It might not be correct in adversarial-policies/experiments/modelfree/dec2018replication.sh:

line 15: python -m modelfree.multi_train with dec2018rep

I guess this command is from an old version, now should be modelfree.multi.train, right?

  1. For a single run, by running modelfree.train it just runs sumo-ants, how to configure the experiments to run other settings?
decodyng commented 5 years ago

@wingsweihua Because we're using Sacred as our experiment setup framework, you can use the "with param=val" syntax to change parameters. You can see where the default values for modelfree.train are set here: https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/modelfree/train.py#L219

So, if you were running train directly, you could add with env_name=multicomp/KickAndDefend-v0, for example, to the train command on the command line. If you're calling train from within Python code, you can pass in a dictionary config_updates, in the form of {"env_name": "multicomp/KickAndDefend-v0"}

wingsweihua commented 5 years ago

@wingsweihua Because we're using Sacred as our experiment setup framework, you can use the "with param=val" syntax to change parameters. You can see where the default values for modelfree.train are set here: https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/modelfree/train.py#L219

So, if you were running train directly, you could add with env_name=multicomp/KickAndDefend-v0, for example, to the train command on the command line. If you're calling train from within Python code, you can pass in a dictionary config_updates, in the form of {"env_name": "multicomp/KickAndDefend-v0"}

Thanks! I'm not familiar with Sacred and this really helps.

AdamGleave commented 5 years ago
  1. It might not be correct in adversarial-policies/experiments/modelfree/dec2018replication.sh:

line 15: python -m modelfree.multi_train with dec2018rep

I guess this command is from an old version, now should be modelfree.multi.train, right? Thanks, you're right it's old code, I've fixed that now.