diambra / arena

DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation
https://docs.diambra.ai
Other
307 stars 22 forks source link

Major refactoring gymnasium et al #88

Closed alexpalms closed 11 months ago

alexpalms commented 11 months ago

Very nice, I think this is all in all a big improvement also code quality wise! One general thing: Dunno how well that is supported with python, but making a interface for the wrapper so you can only pass in classes that have the required methods (e.g step etc) would make this even more robust. This is what chatgpt suggests:

In Python, you can achieve this using abstract base classes (ABCs) provided by the abc module. The idea is to create an abstract base class that defines the step method as an abstract method. Then, any concrete class that inherits from this abstract base class must provide an implementation of the step method.

Here's how you can do it:

  1. Define the abstract base class:
from abc import ABC, abstractmethod

class StepInterface(ABC):

    @abstractmethod
    def step(self):
        pass
  1. Create concrete classes that implement the step method:
class ClassA(StepInterface):
    def step(self):
        print("ClassA step")

class ClassB(StepInterface):
    def step(self):
        print("ClassB step")

If you try to create an instance of a class that inherits from StepInterface but doesn't implement the step method, you'll get a TypeError.

  1. You can then create a list of these class instances and ensure they all have the step method:
objects = [ClassA(), ClassB()]

for obj in objects:
    obj.step()

This will ensure that all classes in the list have a step method. If a class doesn't implement the method, you'll find out when you try to create an instance of the class (not when you try to call the method), which can be a helpful way to catch errors early.

This suggestion is regarding the list of wrappers class to be provided as wrapper setting? If so, I see your point and it is interesting. There is a subtle thing though, in particular you can have different types of wrappers classes you can apply, like Observation Wrappers, Reward Wrappers or the more generic Gym Wrapper. Not all of them have the same method (e.g. the obs wrapper does not have any method of those at all, but only the observation method).

What I can do though, is to add a sanity check and make sure that all the classes in the list provided by the user are one of the gym wrappers class, what do you thing?

discordianfish commented 11 months ago

This suggestion is regarding the list of wrappers class to be provided as wrapper setting? If so, I see your point and it is interesting. There is a subtle thing though, in particular you can have different types of wrappers classes you can apply, like Observation Wrappers, Reward Wrappers or the more generic Gym Wrapper. Not all of them have the same method (e.g. the obs wrapper does not have any method of those at all, but only the observation method).

Now I'm thinking more about this, an wrapper basically wraps the whole env, so it takes an env and returns a new env, right? So we could possibly type additional_wrappers_list with gym.Env or gym.Wrapper.

That being said, now I'm wondering why we have additional_wrappers_list at all. Why not create a diambra env and then add a wrapper like this:

env = arena.make("doapp"...)
wrapped = Wrapper(env, ...)
alexpalms commented 11 months ago

This suggestion is regarding the list of wrappers class to be provided as wrapper setting? If so, I see your point and it is interesting. There is a subtle thing though, in particular you can have different types of wrappers classes you can apply, like Observation Wrappers, Reward Wrappers or the more generic Gym Wrapper. Not all of them have the same method (e.g. the obs wrapper does not have any method of those at all, but only the observation method).

Now I'm thinking more about this, an wrapper basically wraps the whole env, so it takes an env and returns a new env, right? So we could possibly type additional_wrappers_list with gym.Env or gym.Wrapper.

That being said, now I'm wondering why we have additional_wrappers_list at all. Why not create a diambra env and then add a wrapper like this:

env = arena.make("doapp"...)
wrapped = Wrapper(env, ...)

I follow you and I see your points and I understand your question.

In fact, it is possible to do exactly what you suggest in your code, applying wrappers after the environment has been created.

The use of the wrappers argument (previously named additional_wrappers_list) is key when you want to use precooked make functions for external Libs, like stable baselines 2 and 3 (make_sb3_env), as well as others we will interface.

See as an example this: https://github.com/diambra/arena/blob/fba8bed04cef2af8a5318b0f2e3c6d8fb818bd96/diambra/arena/stable_baselines3/make_sb3_env.py#L41

Since SB wraps the env in their own specific wrappers/classes, we want to have a way to insert custom wrappers in addition to the standard ones we provide after we apply ours (i.e. after our make call, line 41 in the link) but before RL lib specific ones are applied (i.e. line 48 in the link)

It is true that you can build the wrapping sequence for SB on your own, and thus use the wrapping sequence you prefer, but that is considered an advanced use case. The typical use cases already showed up in the discord support channel, highlighting the importance of having a pre-built interface with SB (and other libs) as well as being able to add wrappers before the lib-specific ones:

Let me know your thoughts!