Closed jamesheald closed 6 months ago
Hello, you can have a class that derives from on policy algorithm class (that's what we do in sb3 contrib) or fork sb3 if more customisation is needed.
Thanks for the speedy response. I will dig into the details of the sb3 code to see if I need to fork or not (I suspect I do).
❓ Question
I want to modify an algorithm (e.g. PPO) by training additional networks to optimize custom auxiliary losses. These additional networks will be distinct from the standard value functions/policies learned by the stable-baseline algorithms, but will perhaps interact with them (e.g. by being incorporated into the learned policy).
Merely customizing a policy (as descried in https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html) does not seem expressive enough for my needs (if I am not mistaken), as I need to define new losses/optimisers and their associated architectures.
What is the cleanest, most elegant way to modify an algorithm in this way?
Checklist