PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

Automatically set done = true at simulation end #41

Closed ejunprung closed 4 years ago

ejunprung commented 4 years ago

Test this later: https://github.com/SkymindIO/nativerl/commit/ee31b945b5fa8d1fadd8f6350071ef18cf784a35

slinlee commented 4 years ago

This is neat. With this, it works with goal or time based simulations and the user never has to write "is done" right?

We need to add to the documentation that if the agent reaches the goal, it should call finishsimulation (it might be called something else)

ejunprung commented 4 years ago

Yup, that's right. It'll allow us to get rid of the "done" field. They just need to end their simulations as they normally would in AnyLogic.

pathmaykr commented 4 years ago

@ejunprung "done" definitions can potentially be complex. e.g. you could have a hard constraint on the number of steps taken. maybe what you're saying is that such conditions would then also end the simulation? if so the user would still have to define this logic somewhere, but maybe not in PathmindHelper. is that what you're saying?

I wonder how that would carry over to multi-agent scenarios, since in that case you'd need to check if all individual agents (corresponding to entries in the tuple) are done. i.e. in that case the end of the simulation and isDone are different things. See here for reference:

https://github.com/SkymindIO/nativerl/blob/master/nativerl/src/main/java/ai/skymind/nativerl/RLlibHelper.java#L438

ejunprung commented 4 years ago

@pathmaykr Yup, correct. I've encountered two types of done conditions thus far.

  1. Goal based in which AnyLogic users call finishSimulation() to end the simulation whenever some condition is met.
  2. Time based in which the simulation expires based on their model time.

I've seen many users already do the above and these both materialize in the AnyLogic engine returning FINISHED in the backend. Figured we could reduce the cognitive load by automatically setting done to true for the user. Just do what they already do by habit and we handle it from there. It's also just easier to articulate during technical support calls b/c we're only speaking "AnyLogic".

However, I forgot about multi-agent scenarios so I'll need to test that...

pathmaykr commented 4 years ago

yeah, let's be careful about removing isDone prematurely. it's definitely good to reduce cognitive load and it's a great coincidence that for single agents the AnyLogic concept of "simulation done" has the same meaning as "RL agent done", but this might lead to conceptual misunderstandings when users start using multiple agents.

ejunprung commented 4 years ago

Yup, that was a good catch. I'll make sure to test that and check with you to confirm my understanding.

FWIW, calling finishSimulation() inside an AnyLogic event trigger is essentially all the done condition field does so there's no loss in functionality (single agent scenario at least).

ejunprung commented 4 years ago

Closing since Sam's handling the architecture going forward.

saudet commented 4 years ago

Let's keep this open though?

ejunprung commented 4 years ago

@saudet @slinlee Quick question. After switching to this way of setting "done", how will it impact the user experience?

  1. In my mind, we would remove the "done" field in Pathmind Helper. Then, for "done" conditions other than model time (e.g. a goal), users would call finishSimulation(); (this) or something similar directly in their model. Does that align with what you're thinking too?
  2. How will multi-agent scenarios work?
saudet commented 4 years ago

Right, the field probably needs to stay there when there is more than 1 agent. (It could probably be hidden when there is only 1 agent though.) RLlib enforces some conditions when we skip some steps like you found out when testing pull https://github.com/SkymindIO/nativerl/pull/140#issuecomment-680267933. I'm not sure how that's all going to look like. In any case, as found by @SaharEs when testing pull https://github.com/SkymindIO/nativerl/pull/149#issuecomment-686089065, one thing we can do right away is change the default to engine.getState() == Engine.State.FINISHED.

ejunprung commented 4 years ago

Got it. Changing the default to engine.getState() == Engine.State.FINISHED right away sounds like a good start.

saudet commented 4 years ago

I have an idea, we could use the 'no_done_at_end': True flag, which overrides done = true from users: https://github.com/ray-project/ray/blob/master/rllib/evaluation/sampler.py#L836 That effectively prevents us from terminating any agents in an environment, but we can still skip them until the end, and we should get results that are pretty equivalent, just that we might be using more resources hanging onto "done" data in RLlib. We can wait and see if that becomes a problem or not though. This way we could get rid of the Done field and only have the Skip field. Maybe that would be a good way to go about it?

@SaharEs @johnnyL7 @maxpumperla Opinions? Does having agents that never terminate ("infinite horizon") impact learning algorithms in any meaningful way?

saudet commented 4 years ago

Actually, now that I think about it, the whole point of having a "done" flag per agent is to be able to bring new agents in dynamically during the simulation. Since we're not allowing new agents at runtime, but merely a fixed number of them, it doesn't matter that we can't throw them away. The resources they consume isn't going to grow.

ejunprung commented 4 years ago

@saudet I think we should use 'no_done_at_end': True for multi-agent scenarios only to start. I'm getting worried about unexpected consequences in our existing simulations so we'll need to poke at it more.

saudet commented 4 years ago

Yes, that's what I was thinking of doing. It's not used by RLlib in single-agent mode anyway.

saudet commented 4 years ago

I've merged pull #161 with these changes.