steveyuwono commented 6 months ago

Description/Motivation Add the Maskable PPO algorithm provided by SB3-Contrib, which is an extension of SB3, to the pool of objects.

Some methods from the basic wrapper need to be readjusted.

On a personal note @steveyuwono, please refer to the implementation of SSD4OR-RL projects!

Task list

[ ] 1. Do this
[ ] 2. Do that

Related issues

...

Cross references Documentation of Maskable PPO

steveyuwono commented 2 months ago

discussion on 06.09.2024:

detlefarend commented 2 months ago

Hi @steveyuwono @laxmikantbaheti, with regards to your discussion about the transfer of additional data from env to agent: class bf.systems.State already provides kwargs. Means: an env or system can just add additional data to a state. These in turn can be consumed within custom method -compute-action without any further parameters.

Steve's scenario (agent manages an internal action consumption mechanism) -> purely internal detail of a policy Laxmikant's scenario (env provides masks for the agent) -> just add as kwarg within the env/system

I think we can extend the existing wrapper by two new custom methods _get_mask(), _add_to_mask( p_action ) and some additional code in _compute_action():

_get_mask(), _add_to_mask() can be implemented in own child classes inherited from the wrapper (Steve's scenario). Alternatively, we implement it and add a parameter p_action_masking : bool to the wrapper
Extension of _compute_action(): get kwargs from state. If provided, hand over to SB3. If not, call method _get_mask() and hand over to SB3. The resulting action is handed over to _add_to_mask() after that.

What do you think?

detlefarend commented 2 months ago

@steveyuwono you can use the new method State.get_kwargs() but this increases the minimum version of MLPro to > 1.9.0

fhswf / MLPro-Int-SB3

Maskable PPO from SB3-Contrib #8

...