Open cswinter opened 2 years ago
I'd like to start working on this. For a quick start we could simply refactor Entity.features
to be a Dict[str, np.ndarray]
and then rework other interfaces that rely on Entity
.
If we want to be more proper about this, we could introduce a new type Tensor
that has a value
property and a get_shape()
method (and other convenience methods) and let Entity.features
be a Dict[str, Tensor]
. Tensor
will simply be a wrapper around a numpy or torch tensor but it prevents us being tightly coupled to a single framework.
@cswinter and others, curious to hear your thoughts on this approach?
Some initial thoughts:
Entity
), and (2) what is the representation/memory layout of observations (currently np.NDArray[np.float32]
)
np.ndarray
, maybe a set of dataclasses. To describe the shape of features, probably a tuple will do, similar to the shape
property of np.ndarray/torch.tensor.np.NDArray[np.float32]
is the most straightforward/performant solution. For debugging purposes, we'll want an a method that can extract the value of a given feature from the flattened array.Thanks for the response @cswinter ! Perhaps there's some considerations I'm not seeing, but I think this and entity-neural-network/entity-gym#2 can be solved together using the approach I outlined above, with some modifications. So a Tensor
could be defined to be a Union[Continuous, Discrete]
. For example, currently the observation space of MultiSnake
looks like this:
ObsSpace(
{
"SnakeHead": Entity(["x", "y", "color"]),
"SnakeBody": Entity(["x", "y", "color"]),
"Food": Entity(["x", "y", "color"]),
}
)
Instead of "Food": Entity(["x", "y", "color"])
, we could have something like:
"Food": Entity({
"x": Continuous(shape=(,)), # scalar shape is an empty tuple
"y": Continuous(shape=(,)),
"color": Discrete(num_values=4) # categorical variable
}),
Yep, that's pretty much what I had in mind!
Small suggestion, I think it would be good for the class names to be short since they will get written a lot of times. Maybe Float
or Real
instead of Continuous
? Can't think of anything that would be shorter than Discrete
.
Some random thoughts:
Discrete
values could also have a shape. Maybe we want to separately specify types and shapes?Discrete
/Int
(e.g. positions on a grid) and use them in e.g. absolute positional encodings or embeddings, but not one-hot encode them. Maybe the encoding should also be specified separately? Maybe that's not a good idea though.Hey @cswinter I started implementing the interface we discussed: https://github.com/dtch1997/incubator/tree/feature/add_feature_types
A quick question: How set are you on having the internal representationfor an Entity
instance be a single flat np.ndarray
? It seems like that creates a few more problems because 1) the different feature values could have different data types and 2) we'd need to have functions for converting the internal representation to the correct data type / shape of the feature.
To me the most elegant solution would be to let an Entity
instance (let's call it EntityValue
) be a Dict[str, np.ndarray]
instead of a flat np.ndarray
which avoids having to do any conversion and also makes it much simpler to filter the features to form an ObservationSpace
.
I was also thinking that we could allow Entity
to be a hierarchial construct, i.e. Entity.features
can be a Dict[str, Union['Entity', Variable]
so that we can compose smaller entities to form larger ones. Let me know your thoughts.
To me the most elegant solution would be to let an Entity instance (let's call it EntityValue) be a Dict[str, np.ndarray] instead of a flat np.ndarray which avoids having to do any conversion and also makes it much simpler to filter the features to form an ObservationSpace.
Yeah I think something like the Dict[str, np.ndarray]
might be what we want the API to ultimately look like, it's just going to be a good amount of work to still allow it to be performant and we'll have to be careful about how exactly it's set up. There's a couple of places that could become a bottleneck for environments that have more than a small number of features:
VecEnv
interface directly so might be OK.I'm fairly sure this could all still be done efficiently in some way, maybe with a version of the RaggedBuffer
type that supports multiple features and handles all the iterating over features internally. We probably still want to convert the more condensed representation used by the network architecture as soon as possible so we only need to perform the conversion once. Probably as soon as we receive the observation from the environment, and before feeding them to the network and pushing them onto the sample buffer.
The approach I would take is to first figure out what the efficient encoded representation of everything should be, since this is what we want the network architecture and PPO code to use (and also makes that code a lot simpler). Right now, our network architecture doesn't support anything other than a flat tensor of floats and it's slightly unclear how more complex things are going to work, so I think it makes sense to still stick to that representation at least internally. We can then add a conversion layer that enables a more ergonomic API for environments, while still allowing them to directly supply the more efficient representation.
I was also thinking that we could allow Entity to be a hierarchial construct, i.e. Entity.features can be a Dict[str, Union['Entity', Variable] so that we can compose smaller entities to form larger ones. Let me know your thoughts.
We could already compose entities by just merging the feature dict (and merge in multiple instances of the same entity by prefixing the features). I suppose modeling this at the level of the API could allow for things like joint feature normalization across the sub-entities. My sense is that this would complicate a lot of code and wouldn't be worth the trouble at this time. But I also don't think I fully understand the use case for this yet, did you have a particular example in mind?
Yeah I think something like the Dict[str, np.ndarray] might be what we want the API to ultimately look like, it's just going to be a good amount of work to still allow it to be performant and we'll have to be careful about how exactly it's set up.
I see, okay. Concretely I was thinking of having both an object-oriented version and a flattened version of Observation
. The object-oriented version can be used internally by Environment
, such as in Environment._compile_feature_filter
. Then it can be flattened once it is passed to the neural network. I haven't encountered any cases where you would need to do the reverse operation (going from flattened representation to the object-oriented one). IMO splitting it up like this would make implementation of new Environment
s a lot simpler and avoid most of the performance issues you described.
FWIW, in my work with OpenAI gym this is mostly how I manage complex observation spaces too. The gym.Env
can have a dictionary observation space and it gets flattened down to an array just before it gets passed to the policy network.
We could already compose entities by just merging the feature dict (and merge in multiple instances of the same entity by prefixing the features). I suppose modeling this at the level of the API could allow for things like joint feature normalization across the sub-entities. My sense is that this would complicate a lot of code and wouldn't be worth the trouble at this time. But I also don't think I fully understand the use case for this yet, did you have a particular example in mind?
I think there isn't a solid need for this yet in the currently implemented environments, but it might become a useful abstraction for more complex environments which have a natural hierarchy / structure to them. It's more of a forward-thinking design decision, we definitely don't have to worry about it for now.
I think the main use case for going from flattened -> object-oriented version would be things like debugging, logging metrics of (flattened) feature statistics, turning recorded sample traces back into features.
Currently, the observation space of each
Entity
is defined as a single flat list of features which are all assumed to be scalars:https://github.com/entity-neural-network/incubator/blob/be33cae8355d0f79374718e2293983bfc5827779/entity_gym/entity_gym/environment.py#L86-L88
There are other feature shapes that we might want to support:
As @jeremysalwen noted, explicitly modeling the structure of the input space in this has at least two advantages:
Another consideration is the allowed type of elements. Currently, we only support floats, but we should at least also support int/categorical features that can be one-hot encoded. (tracked in entity-neural-network/entity-gym#2)