Closed cswinter closed 2 years ago
We split out the IDS into entity types... but why don't we just put the IDS along with the entities in the features? Having the same place for both things is more coherent and much easier to debug e.g:
I think we could do this, I actually started out combining all of "actions", "ids", and "features" into a single EntityObs
so we have a single entities: Mapping[EntityType, EntityObs]
, but then changed back to the previous scheme since it made it harder to figure out all the algorithmic changes. I think it probably makes sense to still keep actions
separate, since actions feel like a top-level object and often don't make sense to split across entity type, but joining ids and entities seems good. Maybe we could also have a constructor/function on Environment
which you pass just the lists of ids and features, and it does the whole np.array(...
thing and validates that the number of features match and the number of ids match the number of entities.
@Bam4d made a bunch of fixes and also added a new Observation.from_entity_obs
method that allows specifying ids and features side-by-side in the same dict.
It's also completely optional to use np.array
, so you don't need to think about the right shape for empty arrays or dtype.
Ill review this by doing the necessary Griddly changes and seeing if I run into any issues!
going to try get released today so we can build and test with poetry etc... https://github.com/Bam4d/Griddly/pull/165/files
@Bam4d Thanks for the fixes!
This is a large refactor of the
Environment
API. It includes a new tutorial describing how to implement anEnvironment
The main change is the wayEnityID
s are handled:Observation.ids
is now aMapping[EntityType, Sequence[EntityID]]
rather than just aSequence[EntityID]
to make the correspondence between entities and IDs more explicit.Observation.ids
for all entity types, only if they are referenced by an action.*ActionMask
classes now have two ways of specifying the actors:actor_ids: Sequence[EntityID]
now takes a list ofEntityID
s as opposed to indices as was the case foractors
before alternatively,actor_types: Sequence[EntityType]
allows specifying just the types of entities that can perform the action, without listing individual ids.actors = np.array([1, 2, 3])
where 1, 2, 3 happened to be the indices of all entities of some type after joined with all other entity types according to theobs_space
ordering. Now, one could specify this as e.g.actor_ids = [("Robot", 0), ("Robot", 1), ("Robot", 2)]
or even justactor_types = ["Robot"]
in the case where the action is available to all entities of the "Robot" type.actor_ids
andactor_types
, in which case all entities can perform the action.SelectEntityAction
now hasactee_ids
andactee_types
Environment
, which implements a maximally convenient interface, andVecEnv
, which implements a maximally efficient interface. InEnvironment
, actors/actees are referenced via environment-specific ids, andVecEnv
references everything with indices. The translation happens inListEnv
, which converts allEntityID
s into indices when batching observations, keeps track of theEntityID
mapping from the last observation, and converts indices back intoEntityID
s during theact
method. As a side benefit, this pushes the work of constructing theEntityID
lists into the subprocesses when using parallel envs which allows for additional speedups (I've observed > 22K samples/s now).Smaller changes:
Observation.entities
renamed toObservation.features
Observation.action_masks
renamed toObservation.actions
Environment._act|_reset
renamed toEnvironment.act|reset
, andEnvironment.act|reset
renamed toEnvironment.act_filter|reset_filter
*Batch
renamed toVec*
for consistency withVecEnv
Action
types now have separate fields for theactors
andactions/actees
rather than a single list that combines both, and anitems()
method that yields an iterator over the(actor, action)
pairsEntityType = str
andActionType = str
to make the intention of the API more obvious.