RobertTLange / gymnax-blines

Baselines for gymnax 🤖
Apache License 2.0
57 stars 13 forks source link

Add A2C implementation #2

Open RobertTLange opened 2 years ago

RobertTLange commented 2 years ago

Reminder todo after internship.

Mostly for meta-bandit and gridworld tasks

DavidSlayback commented 2 years ago

Suggestion: you could probably just implement it as PPO with fixed parameters (gae=1, no advantage normalization, 1 epoch, 1 minibatch, no value clipping) as per "A2C is a Special Case of PPO"

RobertTLange commented 2 years ago

Good point, I didn't know about this equivalence. For the meta-RL setups I may have to write some extra logic but will try to keep things minimal.