Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.14k stars 793 forks source link

[Question] Providing type arguments to gymnasium.Env? #845

Closed AbhijeetKrishnan closed 10 months ago

AbhijeetKrishnan commented 10 months ago

Question

gymnasium.Env is a generic class that custom environments need to inherit from (as is explained in the custom environment creation tutorial). However, simply doing something like class CustomEnv(gymnasium.Env) leads to a typing error as caught by mypy -

Missing type parameters for generic type "Env"  [type-arg]
    class CustomEnv(gymnasium.Env):

Solving this would require providing the requisite type arguments like class CustomEnv(gymnasium.Env[ObsType, ActType]), with the appropriate types for ObsType, ActType. But these are usually defined in terms of spaces, and there's no clear way of obtaining the "inner type" of a space (i.e., the type of what would be returned from Space.sample()) from the space itself. For e..g, if I have -

self.observation_space = gymnasium.spaces.Box(0, 1, shape=(3, 3))
self.action_space = gymnasium.spaces.Discrete(4)

How could I provide appropriate type arguments to gymnasium.Env in order to type it correctly?

Kallinteris-Andreas commented 10 months ago

Example from MuJoCo environments https://github.com/Farama-Foundation/Gymnasium/blob/443b1940f11087280663e884edea571f47f72413/gymnasium/envs/mujoco/mujoco_env.py#L30

pseudo-rnd-thoughts commented 10 months ago

The ObsType and ActType should be equal to the type of the observation and action sample, e.g., numpy NDArray for mujoco. The Env documentation should be updated to discuss this

AbhijeetKrishnan commented 10 months ago

Example from MuJoCo environments

https://github.com/Farama-Foundation/Gymnasium/blob/443b1940f11087280663e884edea571f47f72413/gymnasium/envs/mujoco/mujoco_env.py#L30

This works because the observation space is a parameter of type Space passed during env creation. If I had an environment that had its observation and/or action spaces defined in terms of the pre-defined gym.spaces spaces like Discrete, Box or MultiBinary, then the type of the class attributes self.observation_space and self.action_space become those gym.spaces classes. Then, specifying the type arguments in gym.Env doesn't work because it expects the type of, for e.g., self.observation_space to be Space[ObsType], but it gets Box (or whatever).

This is probably only an issue if you have strict type-checking with mypy enabled, and may not be a priority. I did find it a bit inconvenient though that you couldn't deduce the observation/action type of an environment when using a specific Space class for the self.observation_space and self.action_space attributes.

Here's a MWE -

from typing import Any, Dict, Tuple

import gymnasium as gym
import numpy as np

class SimpleEnv(gym.Env[np.ndarray[int, np.dtype[np.float64]], int]):
    def __init__(self) -> None:
        self.action_space = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(1,))

    def reset(
        self, seed: int | None = None, options: Dict[str, Any] | None = None
    ) -> Tuple[np.ndarray[int, np.dtype[np.float64]], Dict[str, Any]]:
        self.state = np.random.uniform(low=0, high=1, size=(1,))
        return self.state, {}

    def step(
        self, action: int
    ) -> Tuple[
        np.ndarray[int, np.dtype[np.float64]], float, bool, bool, Dict[str, Any]
    ]:
        return self.state, np.random.rand(), False, False, {}

    def render(self) -> None:
        print(self.state)

Running mypy with mypy --strict simple.py fails with -

$ mypy --strict simple.py
simple.py:9: error: Incompatible types in assignment (expression has type "Discrete", variable has type "Space[int]")  [assignment]
Found 1 error in 1 file (checked 1 source file)
pseudo-rnd-thoughts commented 10 months ago

Thanks @AbhijeetKrishnan for the MWE

The reported error is a bit misleading as Discrete is Space[np.int64] (it was recently changed from int to np.int64). So if you change int to np.int64 in the appropriate places then I don't get any issues

from __future__ import annotations

from typing import Any, Dict, Tuple

import gymnasium as gym
import numpy as np
from numpy.typing import NDArray

class SimpleEnv(gym.Env[NDArray[np.float64], np.int64]):

    def __init__(self) -> None:
        self.action_space: gym.Space[np.int64] = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(1,))

        self.state = np.zeros((1,))

    def reset(
        self, seed: int | None = None, options: Dict[str, Any] | None = None
    ) -> Tuple[NDArray[np.float64], Dict[str, Any]]:
        super().reset(seed=seed, options=options)

        self.state = self.np_random.uniform(low=0, high=1, size=(1,))
        return self.state, {}

    def step(
        self, action: np.int64
    ) -> Tuple[NDArray[np.float64], float, bool, bool, Dict[str, Any]]:
        reward = float(self.np_random.normal(loc=0, scale=1, size=()))

        return self.state, reward, False, False, {}

    def render(self) -> None:
        print(self.state)