fhswf / MLPro-Int-SB3

MLPro: Integration StableBaselines3
https://mlpro-int-sb3.readthedocs.io
Apache License 2.0
2 stars 0 forks source link

Maskable PPO from SB3-Contrib #8

Closed steveyuwono closed 3 weeks ago

steveyuwono commented 6 months ago

Description/Motivation Add the Maskable PPO algorithm provided by SB3-Contrib, which is an extension of SB3, to the pool of objects.

Some methods from the basic wrapper need to be readjusted.

On a personal note @steveyuwono, please refer to the implementation of SSD4OR-RL projects!

Task list

Related issues

...

Cross references Documentation of Maskable PPO

steveyuwono commented 2 months ago

discussion on 06.09.2024:

Image

detlefarend commented 2 months ago

Hi @steveyuwono @laxmikantbaheti, with regards to your discussion about the transfer of additional data from env to agent: class bf.systems.State already provides kwargs. Means: an env or system can just add additional data to a state. These in turn can be consumed within custom method -compute-action without any further parameters.

Steve's scenario (agent manages an internal action consumption mechanism) -> purely internal detail of a policy Laxmikant's scenario (env provides masks for the agent) -> just add as kwarg within the env/system

I think we can extend the existing wrapper by two new custom methods _get_mask(), _add_to_mask( p_action ) and some additional code in _compute_action():

What do you think?

detlefarend commented 2 months ago

@steveyuwono you can use the new method State.get_kwargs() but this increases the minimum version of MLPro to > 1.9.0