Open ttamyurek opened 1 month ago
This is an interesting point. On review I'm uncertain that this is unintended or unexpected default behavior. The simulator shouldn't necessarily delete an entity from the simulation once it is flagged as "done" by the training framework.
The current default behavior is to have the simulator continue to run using the last action supplied to the entity. This doesn't seem to be well-documented behavior so I do think we need to clarify what we want the default behavior to be and document it. My opinion is that default behavior should be one of the following:
@keatincf, @kyle-dunlap I'd love your input on this so we can update and document it.
Theoretically, entities that have completed their task wouldn't still be trying to apply actions to achieve a goal. Other entities that are still working on their tasks would need to avoid the done entity as they are still an obstacle within the environment.
I don't think applying the last action to the done entity makes sense as you could be applying a thrust action causing the entity to speed out of the environment.
I think applying some default action to maintain a safe state would make sense. That could be no action, assuming that no action maintains that safe state. It could be getting into a natural motion trajectory and then no action. No action from the start is probably the easiest, but I think the simulation already allows for specifying a default action that isn't necessarily doing nothing.
I agree that 1) applying the last action isn't a good idea, so I think 2) or 3) are the way to go. Applying a default action to maintain a safe state like you said Charles is interesting, but seems like maybe more effort than is needed (plus requires a specific solution for each new environment). Perhaps you could leave it open for a user to develop that.
Letting agents continue after a "done" is also an interesting case. If the done condition is a crash, do we want them to continue moving in the environment? In real life they might now be broken and their dynamics might have changed. What about going out of bounds? This is kind of like "disappearing off the radar" of the agents that are still in bounds. I can't immediately think of a non-shared done condition where you would really want the agent to continue moving in the environment, which is pushing me towards option 3.
I was writing up a few thoughts on handling of different done conditions, but realized architecturally that I'm not sure there's much we can do. The simulator doesn't get told about done agents, it just continues to be stepped as long as the environment is maintained. The actions also come from the environment. It would be up to the environment to provide an appropriate action for a done platform and I think some of that comes from the underlying RL library.
Note: The wording in this comment is going to get weird as the environment has platforms and the simulation has entities, but the simulator also keeps track of platforms to map between the environment and the simulation.
Took a further look at the underlying CoRL code base to see how the simulator is supposed to be used. It looks any platform in the environment that is done is supposed to be removed from the simulator. Looks like SafeRLSimulator
doesn't provide an implementation of delete_platform
. If we added that implementation, that would provide the simulator a way to know if a platform was done in the environment.
That brings us back to the question of what should a done platform do within the simulation. It looks like the current implementation would end up using the default action for the entity as the dictionary mapping entities to actions only contains actions for platforms in the environment. Looks like the default action of the entities used by base experiment configs is all zeroes (no thrust).
I think it would make sense for the done platform to still step in the simulation, but essentially not have any thrust. They'd maintain their current movement, which could theoretically cause a crash. However, that would allow other active agents to have to learn how to deal with that.
So, I would recommend just implementing the delete_platform
function in SafeRLSimulator
and the rest of the functionality can stay the same.
Note: The wording in this comment is going to get weird as the environment has platforms and the simulation has entities, but the simulator also keeps track of platforms to map between the environment and the simulation.
Took a further look at the underlying CoRL code base to see how the simulator is supposed to be used. It looks any platform in the environment that is done is supposed to be removed from the simulator. Looks like
SafeRLSimulator
doesn't provide an implementation ofdelete_platform
. If we added that implementation, that would provide the simulator a way to know if a platform was done in the environment.That brings us back to the question of what should a done platform do within the simulation. It looks like the current implementation would end up using the default action for the entity as the dictionary mapping entities to actions only contains actions for platforms in the environment. Looks like the default action of the entities used by base experiment configs is all zeroes (no thrust).
I think it would make sense for the done platform to still step in the simulation, but essentially not have any thrust. They'd maintain their current movement, which could theoretically cause a crash. However, that would allow other active agents to have to learn how to deal with that.
So, I would recommend just implementing the
delete_platform
function inSafeRLSimulator
and the rest of the functionality can stay the same.
I like that approach. We also need to document that behavior since its currently not well presented. I'll run this down, thanks for the input everyone!
The _step_entity_state() method in the SafeRL simulator steps all of the sim_entities regardless of their condition in the simulation. Since last entity actions are applied to spacecrafts in
_step_get_entity_actions()
even though the platform name is not in theplatforms_to_action
dictionary, it triggers issues in multi-agent scenarios such as weird agent behaviors and unexpected done triggers. This is an issue for multi-agent scenarios where the simulation has to continue even if some agents have hit a done but there are still other agents alive trying to complete the task. Feel free to remove this issue if you don't think this is a problem for this repo.I think platform entities should be stepped only if they are in the
platforms_to_action
dictionary. Or they should be stepped withaction=None
. However, non-platform entities such as sun, chief, etc. should still be stepped.