Closed ejunprung closed 4 years ago
This is neat. With this, it works with goal or time based simulations and the user never has to write "is done" right?
We need to add to the documentation that if the agent reaches the goal, it should call finishsimulation (it might be called something else)
Yup, that's right. It'll allow us to get rid of the "done" field. They just need to end their simulations as they normally would in AnyLogic.
@ejunprung "done" definitions can potentially be complex. e.g. you could have a hard constraint on the number of steps taken. maybe what you're saying is that such conditions would then also end the simulation? if so the user would still have to define this logic somewhere, but maybe not in PathmindHelper. is that what you're saying?
I wonder how that would carry over to multi-agent scenarios, since in that case you'd need to check if all individual agents (corresponding to entries in the tuple) are done. i.e. in that case the end of the simulation and isDone
are different things. See here for reference:
@pathmaykr Yup, correct. I've encountered two types of done conditions thus far.
finishSimulation()
to end the simulation whenever some condition is met.I've seen many users already do the above and these both materialize in the AnyLogic engine returning FINISHED
in the backend. Figured we could reduce the cognitive load by automatically setting done to true
for the user. Just do what they already do by habit and we handle it from there. It's also just easier to articulate during technical support calls b/c we're only speaking "AnyLogic".
However, I forgot about multi-agent scenarios so I'll need to test that...
yeah, let's be careful about removing isDone
prematurely. it's definitely good to reduce cognitive load and it's a great coincidence that for single agents the AnyLogic concept of "simulation done" has the same meaning as "RL agent done", but this might lead to conceptual misunderstandings when users start using multiple agents.
Yup, that was a good catch. I'll make sure to test that and check with you to confirm my understanding.
FWIW, calling finishSimulation() inside an AnyLogic event trigger is essentially all the done condition field does so there's no loss in functionality (single agent scenario at least).
Closing since Sam's handling the architecture going forward.
Let's keep this open though?
@saudet @slinlee Quick question. After switching to this way of setting "done", how will it impact the user experience?
finishSimulation();
(this) or something similar directly in their model. Does that align with what you're thinking too?Right, the field probably needs to stay there when there is more than 1 agent. (It could probably be hidden when there is only 1 agent though.) RLlib enforces some conditions when we skip some steps like you found out when testing pull https://github.com/SkymindIO/nativerl/pull/140#issuecomment-680267933. I'm not sure how that's all going to look like. In any case, as found by @SaharEs when testing pull https://github.com/SkymindIO/nativerl/pull/149#issuecomment-686089065, one thing we can do right away is change the default to engine.getState() == Engine.State.FINISHED
.
Got it. Changing the default to engine.getState() == Engine.State.FINISHED
right away sounds like a good start.
I have an idea, we could use the 'no_done_at_end': True
flag, which overrides done = true
from users:
https://github.com/ray-project/ray/blob/master/rllib/evaluation/sampler.py#L836
That effectively prevents us from terminating any agents in an environment, but we can still skip them until the end, and we should get results that are pretty equivalent, just that we might be using more resources hanging onto "done" data in RLlib. We can wait and see if that becomes a problem or not though. This way we could get rid of the Done
field and only have the Skip
field. Maybe that would be a good way to go about it?
@SaharEs @johnnyL7 @maxpumperla Opinions? Does having agents that never terminate ("infinite horizon") impact learning algorithms in any meaningful way?
Actually, now that I think about it, the whole point of having a "done" flag per agent is to be able to bring new agents in dynamically during the simulation. Since we're not allowing new agents at runtime, but merely a fixed number of them, it doesn't matter that we can't throw them away. The resources they consume isn't going to grow.
@saudet I think we should use 'no_done_at_end': True
for multi-agent scenarios only to start. I'm getting worried about unexpected consequences in our existing simulations so we'll need to poke at it more.
Yes, that's what I was thinking of doing. It's not used by RLlib in single-agent mode anyway.
I've merged pull #161 with these changes.
Test this later: https://github.com/SkymindIO/nativerl/commit/ee31b945b5fa8d1fadd8f6350071ef18cf784a35