adaptive-intelligent-robotics / QDax

Accelerated Quality-Diversity
https://qdax.readthedocs.io/en/latest/
MIT License
258 stars 42 forks source link

141 add me pbt #143

Closed ranzenTom closed 1 year ago

ranzenTom commented 1 year ago

Related issues: https://github.com/adaptive-intelligent-robotics/QDax/issues/141

This PR adds the population based training (PBT) algorithm and the MAP-Elites PBT (recently accepted at ICLR) algorithm. Both methods are compatible with both TD3 and SAC.

This PR introduces:

Checks

codecov-commenter commented 1 year ago

Codecov Report

Merging #143 (c8d0c2f) into develop (7cfd5bc) will decrease coverage by 0.14%. The diff coverage is 91.43%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           develop     #143      +/-   ##
===========================================
- Coverage    92.41%   92.28%   -0.14%     
===========================================
  Files          105      116      +11     
  Lines         5910     6763     +853     
===========================================
+ Hits          5462     6241     +779     
- Misses         448      522      +74     
Impacted Files Coverage Δ
qdax/core/containers/mapelites_repertoire.py 85.71% <ø> (ø)
qdax/core/containers/mome_repertoire.py 98.14% <ø> (ø)
qdax/core/distributed_map_elites.py 100.00% <ø> (ø)
qdax/core/neuroevolution/sac_td3_utils.py 100.00% <ø> (ø)
qdax/environments/exploration_wrappers.py 32.97% <0.00%> (ø)
qdax/environments/locomotion_wrappers.py 85.49% <ø> (ø)
qdax/environments/humanoidtrap.py 18.30% <18.30%> (ø)
qdax/environments/__init__.py 86.79% <60.00%> (-3.01%) :arrow_down:
qdax/baselines/pbt.py 76.47% <76.47%> (ø)
qdax/baselines/sac_pbt.py 96.46% <96.46%> (ø)
... and 24 more

... and 2 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

limbryan commented 1 year ago

Seems like this structure and update finally managed to solve our longstanding issue of difference between the td3 and sac algorithm structure to reduce our code duplication and have more uniform/modular structure! Should we get rid of useless do_iterations and warmstart buffer in mdp_utils? Or Completely get rid of mdp_utils and move the stuff inside sac_td3_utils? do_iteration and warmstart buffer for both td3 and sac is taken from sac_td3_utils now