How to elegantly modify an algorithm by adding new architectures trained with custom losses?

jamesheald commented 6 months ago

❓ Question

I want to modify an algorithm (e.g. PPO) by training additional networks to optimize custom auxiliary losses. These additional networks will be distinct from the standard value functions/policies learned by the stable-baseline algorithms, but will perhaps interact with them (e.g. by being incorporated into the learned policy).

Merely customizing a policy (as descried in https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html) does not seem expressive enough for my needs (if I am not mistaken), as I need to define new losses/optimisers and their associated architectures.

What is the cleanest, most elegant way to modify an algorithm in this way?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 6 months ago

Hello, you can have a class that derives from on policy algorithm class (that's what we do in sb3 contrib) or fork sb3 if more customisation is needed.

jamesheald commented 6 months ago

Thanks for the speedy response. I will dig into the details of the sb3 code to see if I need to fork or not (I suspect I do).

DLR-RM / stable-baselines3

How to elegantly modify an algorithm by adding new architectures trained with custom losses? #1881

❓ Question

Checklist