Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
6.25k stars 720 forks source link

[Proposal] Allowing environments with different Box action/observation space limits in a vector environment #775

Open howardh opened 7 months ago

howardh commented 7 months ago

Proposal

https://github.com/Farama-Foundation/Gymnasium/blob/8333df8666811d1d0f87f1ca71803cc58bcf09c6/gymnasium/vector/sync_vector_env.py#L251-L266 https://github.com/Farama-Foundation/Gymnasium/blob/8333df8666811d1d0f87f1ca71803cc58bcf09c6/gymnasium/vector/async_vector_env.py#L576-L596

Currently, these methods checks if all the observation and action spaces in a vector environment are identical, and raises an error if they are not. I'm assuming this is the case because we want to ensure that we can stack the observations and actions into one numpy array. I'm proposing a change to allow differences in the observation and action spaces as long as the shapes are consistent (e.g. the values in the low and high portions of a Box space).

The change can be implemented with an optional parameter to enable/disable it when creating the vector environments to preserve current default behaviours for now.

Motivation

I want to vectorize environments with different action space boundaries but the current implementation of vector environments does not allow for that.

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 7 months ago

This is a very good point that I had never thought about. The Space.__eq__ checks for straight equivalence where in this case what we actually care about is Space.shape equivalence. I would want to do some testing with some of the newer (weird) spaces with how they act with the vector environments (Sequence, Graph and Text) but I see no issue with changing the == to has_space_shape_equivalence(env_1, env_2)

@howardh Would you be happy to implement the shape equivalence function (probably in vector.utils), add tests for it, change the vector code and add tests on the vector side as well?

reginald-mclean commented 4 days ago

@pseudo-rnd-thoughts I am happy to do these tasks if @howardh hasn't

pseudo-rnd-thoughts commented 4 days ago

@reginald-mclean go for it.

Looking back at this, another change is that the batched space reflect the composition of spaces in terms of low, high, etc

pseudo-rnd-thoughts commented 4 days ago

I got bored watching the UK general election so did most of the code for batching varying spaces

"""The batch space module."""

from copy import deepcopy
from functools import singledispatch

import numpy as np
import pytest

from gymnasium import Space
from gymnasium.spaces import (
    Box,
    Dict,
    Discrete,
    Graph,
    MultiBinary,
    MultiDiscrete,
    OneOf,
    Sequence,
    Text,
    Tuple,
)
from gymnasium.vector.utils import batch_space, iterate
from tests.spaces.utils import TESTING_SPACES, TESTING_SPACES_IDS

@singledispatch
def batch_spaces(spaces: list[Space]):
    """Batch a Sequence of spaces that allows the subspaces to contain minor differences."""
    assert len(spaces) > 0
    assert all(isinstance(space, type(spaces[0])) for space in spaces)
    assert type(spaces[0]) in batch_spaces.registry

    return batch_spaces.dispatch(type(spaces[0]))(spaces)

@batch_spaces.register(Box)
def _batch_spaces_box(spaces: list[Box]):
    assert all(spaces[0].dtype == space for space in spaces)

    return Box(
        low=np.array([space.low for space in spaces]),
        high=np.array([space.high for space in spaces]),
        dtype=spaces[0].dtype,
        seed=deepcopy(spaces[0].np_random),
    )

@batch_spaces.register(Discrete)
def _batch_spaces_discrete(spaces: list[Discrete]):
    return MultiDiscrete(
        nvec=np.array([space.n for space in spaces]),
        start=np.array([space.start for space in spaces]),
    )

@batch_spaces.register(MultiDiscrete)
def _batch_spaces_multi_discrete(spaces: list[MultiDiscrete]):
    return Box(
        low=np.array([space.start for space in spaces]),
        high=np.array([space.start + space.nvec for space in spaces]) - 1,
        dtype=spaces[0].dtype,
        seed=deepcopy(spaces[0].np_random),
    )

@batch_spaces.register(MultiBinary)
def _batch_spaces_multi_binary(spaces: list[MultiBinary]):
    assert all(spaces[0].shape == space.shape for space in spaces)

    return Box(
        low=0,
        high=1,
        shape=(len(spaces),) + spaces[0].shape,
        dtype=spaces[0].dtype,
        seed=deepcopy(spaces[0].np_random),
    )

@batch_spaces.register(Tuple)
def _batch_spaces_tuple(spaces: list[Tuple]):
    return Tuple(
        tuple(
            batch_spaces(subspaces)
            for subspaces in zip(*[space.spaces for space in spaces])
        ),
        seed=deepcopy(spaces[0].np_random),
    )

@batch_spaces.register(Dict)
def _batch_spaces_dict(spaces: list[Dict]):
    assert all(spaces[0].keys() == space.keys() for space in spaces)

    return Dict(
        {
            key: batch_spaces([space[key] for space in spaces])
            for key in spaces[0].keys()
        },
        seed=deepcopy(spaces[0].np_random),
    )

@batch_spaces.register(Graph)
@batch_spaces.register(Text)
@batch_spaces.register(Sequence)
@batch_spaces.register(OneOf)
def _batch_spaces_undefined(spaces: list[Graph | Text | Sequence | OneOf]):
    return Tuple(spaces, seed=deepcopy(spaces[0].np_random))

@pytest.mark.parametrize(
    "spaces,expected_space",
    [
        (
            (
                Box(low=0, high=1, shape=(2,), dtype=np.float32),
                Box(low=2, high=np.array([3, 5], dtype=np.float32)),
            ),
            Box(low=np.array([[0, 0], [2, 2]]), high=np.array([[1, 1], [3, 5]])),
        ),
        (
            (

            ),
        )
    ],
)
def test_varying_spaces(spaces: list[Space], expected_space):
    """Test the batch spaces function."""
    batched_space = batch_spaces(spaces)
    assert batched_space == expected_space

    batch_samples = batched_space.sample()
    for sub_space, sub_sample in zip(spaces, iterate(batched_space, batch_samples)):
        assert sub_sample in sub_space

@pytest.mark.parametrize("space", TESTING_SPACES, ids=TESTING_SPACES_IDS)
@pytest.mark.parametrize("n", [1, 3])
def test_batch_spaces_vs_batch_space(space, n):
    """Test the batch_spaces and batch_space functions."""
    batched_space = batch_space(space, n)
    batched_spaces = batch_spaces([deepcopy(space) for _ in range(n)])

    assert batched_space == batched_spaces, f"{batched_space=}, {batched_spaces=}"

It is missing some more testing and integration with vector environments. Also, I'm not sure if batch_spaces is a good name as I fear it is too close to batch_space though the number of parameters differ