Redesign Environment API

cswinter commented 2 years ago

This is a large refactor of the Environment API. It includes a new tutorial describing how to implement an Environment The main change is the way EnityIDs are handled:

Observation.ids is now a Mapping[EntityType, Sequence[EntityID]] rather than just a Sequence[EntityID] to make the correspondence between entities and IDs more explicit.
It is not required to set Observation.ids for all entity types, only if they are referenced by an action.
The *ActionMask classes now have two ways of specifying the actors:
- actor_ids: Sequence[EntityID] now takes a list of EntityIDs as opposed to indices as was the case for actors before alternatively, actor_types: Sequence[EntityType] allows specifying just the types of entities that can perform the action, without listing individual ids.
- Example: previously, one might have specified the mask as actors = np.array([1, 2, 3]) where 1, 2, 3 happened to be the indices of all entities of some type after joined with all other entity types according to the obs_space ordering. Now, one could specify this as e.g. actor_ids = [("Robot", 0), ("Robot", 1), ("Robot", 2)] or even just actor_types = ["Robot"] in the case where the action is available to all entities of the "Robot" type.
- It is permissible to omit both actor_ids and actor_types, in which case all entities can perform the action.
- Similarly, the SelectEntityAction now has actee_ids and actee_types
There is now a clear split between Environment, which implements a maximally convenient interface, and VecEnv, which implements a maximally efficient interface. In Environment, actors/actees are referenced via environment-specific ids, and VecEnv references everything with indices. The translation happens in ListEnv, which converts all EntityIDs into indices when batching observations, keeps track of the EntityID mapping from the last observation, and converts indices back into EntityIDs during the act method. As a side benefit, this pushes the work of constructing the EntityID lists into the subprocesses when using parallel envs which allows for additional speedups (I've observed > 22K samples/s now).

Smaller changes:

Observation.entities renamed to Observation.features
Observation.action_masks renamed to Observation.actions
Environment._act|_reset renamed to Environment.act|reset, and Environment.act|reset renamed to Environment.act_filter|reset_filter
*Batch renamed to Vec* for consistency with VecEnv
The Action types now have separate fields for the actors and actions/actees rather than a single list that combines both, and an items() method that yields an iterator over the (actor, action) pairs
New type aliases for EntityType = str and ActionType = str to make the intention of the API more obvious.

cswinter commented 2 years ago

We split out the IDS into entity types... but why don't we just put the IDS along with the entities in the features? Having the same place for both things is more coherent and much easier to debug e.g:

I think we could do this, I actually started out combining all of "actions", "ids", and "features" into a single EntityObs so we have a single entities: Mapping[EntityType, EntityObs], but then changed back to the previous scheme since it made it harder to figure out all the algorithmic changes. I think it probably makes sense to still keep actions separate, since actions feel like a top-level object and often don't make sense to split across entity type, but joining ids and entities seems good. Maybe we could also have a constructor/function on Environment which you pass just the lists of ids and features, and it does the whole np.array(... thing and validates that the number of features match and the number of ids match the number of entities.

cswinter commented 2 years ago

@Bam4d made a bunch of fixes and also added a new Observation.from_entity_obs method that allows specifying ids and features side-by-side in the same dict.

cswinter commented 2 years ago

It's also completely optional to use np.array, so you don't need to think about the right shape for empty arrays or dtype.

Bam4d commented 2 years ago

Ill review this by doing the necessary Griddly changes and seeing if I run into any issues!

Bam4d commented 2 years ago

going to try get released today so we can build and test with poetry etc... https://github.com/Bam4d/Griddly/pull/165/files

cswinter commented 2 years ago

@Bam4d Thanks for the fixes!

entity-neural-network / incubator

Redesign Environment API #166