Closed Junggy closed 5 years ago
Hi @Junggy, one thing that can happen with timescale is the physics becomes less accurate. Is your game very physics-dependent (e.g. collisions)? If so, since you're taking larger steps in the simulation, you might run into differences in behavior.
@ervteng my game has no physics and purely depends on visual observation. It has to observe each frame -> take step -> update environment -> observe new frame -> take step ... Very important thing is to take step for each frame update, means environment must not be changed before step is taken. (this is reason I put RequestDecision() in Update()).
However with time-scale=100 with that RequestDecision() in Update(), (at least in unity program) frame seems to be skipped (i.e. game runs 100 times faster, but frame rate stays same as time-scale=1). Somehow seems like observation stays same, but just taking a lot of steps. Maybe this is the reason I am experiencing the different behavior ?
Disclaimer: The following statements are based on some testing, please correct me if I'm wrong.
The problem here is, that Update()
does not actually scale with the timeScale
. Only FixedUpdate
does.
Let's assume you set the targetFrameRate
to 60. As far as I know, the simulation will then try to achieve 60 Update
calls per real time second and 50 FixedUpdate
calls per gametime/emulated second.
This means when your game runs with timeScale
100 and you target 60 FPS (and your machine is strong enough), you will end up with only 60 Update FPS (with factor 100 this means only about 1/2 Update
call per emulated second) but 50 * 100 = 5'000 FixedUpdate
FPS (with factor 100 this means the targeted 50 calls per emulated second). When your machine is not able to run the scene with your provided timescale at the desired framerate, things get even worse.
In my opinion, this is one of the main flaws with the environment simulation in ml-agents. It would be really nice if there was a "virtual clock" which assures that Update/FixedUpdate are both called at some specified rate, independent from the real time and the performance of the system.
@tomatenbrei so you mean, fixedupdate() -> synced with time scale (and also academy's step... maybe?) update() -> synced with rendering. So I put nothing in fixed update with time scale=100 & set action decision synced with update(), then agent takes step 100 times faster but with doing nothing, and only decide action when rendering is done (i.e. 60 decision per second)?
This case increasing time scale will have only effect of finishing episode early by exceeding steps, right? (i.e. time scale = 100 -> finish episode is 100 times earlier)
if so, it makes sense ...
@Junggy
The step function of the academy (MLAgents.Academy.AcademyStep
) is actually called by MLAgents.Academy.EnvironmentStep
which is called by FixedUpdate
, so they are directly "synced".
The rendering is not tied to Update
, it is actually done in FixedUpdate
as well (because the visual observation is processed in MLAgents.Agent.ObservationToTexture
, which is executed inside the agents observation-action step inside MLAgents.Academy.EnvironmentStep
).
Maybe it already helps when you place your RequestAction
somewhere inside FixedUpdate
.
@tomatenbrei
so Academy set & Obervation rendering -> synced with Fixedupdate() rendering for Scene in unity -> synced with Update() So what I see in unity program's scene (or game display) is not what the agent see, is this what you mean ?
as I want to sync action decision with frame, I un-tick on-demand-decision and set decision interval to 1 (I saw somewhere in documentation that action decision is synced with fixed update by default), target frame rate = -1 But it doesn't seems like in unity scene updates as frequent as my time scale. (It rather render nothing when I set time scale = 100. But for sure it runs. Because I changed tensorflow code to output some result at first few steps, and it does output result). This means, obeservation & steps & decision take place according to time scale, but only rendering on unity scene(or game display) get too slowed down to show any result, cause they are not synced ?
So what I see in unity program's scene (or game display) is not what the agent see, is this what you mean ?
Yes, I think this is the case. You see the regular rendering of the scene, the agents see what their cameras see during the rendering call inside FixedUpdate
.
This means, obeservation & steps & decision take place according to time scale, but only rendering on unity scene(or game display) get too slowed down to show any result, cause they are not synced ?
Yes, they use the time scale. But I do not know how the rendering of the Unity Editor is handled.
Maybe for future reference, here are some numbers which illustrate what happens. I recorded the number of Update
and FixedUpdate
calls after 5000 FixedUpdate steps and checked what happens when I change the timeScale
or targetFrameRate
:
Target Frame Rate = -1
Current timeScale: 1.00
Update: 106.38 FPS (59.99 scaled) --- 5999 calls
FixedUpdate: 88.67 FPS (50.00 scaled)--- 5000 calls
Delta: 56.39, DeltaScaled: 100.00, real timeScale: 1.77
Current timeScale: 10.00
Update: 43.14 FPS (5.99 scaled) --- 599 calls
FixedUpdate: 360.08 FPS (50.00 scaled)--- 5000 calls
Delta: 13.89, DeltaScaled: 100.00, real timeScale: 7.20
Current timeScale: 100.00
Update: 6.58 FPS (0.59 scaled) --- 59 calls
FixedUpdate: 557.30 FPS (50.00 scaled)--- 5000 calls
Delta: 8.97, DeltaScaled: 100.00, real timeScale: 11.15
Target Frame Rate = 60
Current timeScale: 1.00
Update: 57.52 FPS (59.99 scaled) --- 5999 calls
FixedUpdate: 47.94 FPS (50.00 scaled)--- 5000 calls
Delta: 104.30, DeltaScaled: 100.00, real timeScale: 0.96
(other timeScale values produce identical results compared to Target Frame Rate == -1)
One can see that with increasing timeScale
, the number of FixedUpdate
calls stays at 50 per simulated second but the number of Update
calls decreases. Note that when I set my target framerate to 60, the total number of Update
/ FixedUpdate
calls stays the same but I need more real time (104 seconds instead of 56 seconds) until the 5000 environment steps are done.
@ervteng
Hi all -- this issue has been inactive for some time so I'm going to close it. Feel free to reopen or create a new issue if you have more to discuss.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hello,
currently, I trained with both --slow and --fast (or not specified) and I found some different behavior.
firstly, I trained my agent with --slow option ( max steps = 1500 ) while training with that option, most of episodes are finished with either my bad-condition or good-condition. However I tried training with --fast option (or not specified) with time scale=100, --load from pre-trained agent (trained with --slow), then 100% of the episodes are terminated by exceeding max steps.
I expected my agent to do same task just faster (i.e. finish episodes either by good or bad condition, not exceeding max steps), but for now, it doesn't seem like that. (FYI, I set on-demand-decisions & put this.RequestDecision() in Update() so that it is synced with frame)
Is this supposed to be like this? and what is the possible reason of different behavior?