DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.85k stars 1.68k forks source link

How to elegantly modify an algorithm by adding new architectures trained with custom losses? #1881

Closed jamesheald closed 6 months ago

jamesheald commented 6 months ago

❓ Question

I want to modify an algorithm (e.g. PPO) by training additional networks to optimize custom auxiliary losses. These additional networks will be distinct from the standard value functions/policies learned by the stable-baseline algorithms, but will perhaps interact with them (e.g. by being incorporated into the learned policy).

Merely customizing a policy (as descried in https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html) does not seem expressive enough for my needs (if I am not mistaken), as I need to define new losses/optimisers and their associated architectures.

What is the cleanest, most elegant way to modify an algorithm in this way?

Checklist

araffin commented 6 months ago

Hello, you can have a class that derives from on policy algorithm class (that's what we do in sb3 contrib) or fork sb3 if more customisation is needed.

jamesheald commented 6 months ago

Thanks for the speedy response. I will dig into the details of the sb3 code to see if I need to fork or not (I suspect I do).