Different behavior in --fast and --slow during training

Junggy commented 5 years ago

Hello,

currently, I trained with both --slow and --fast (or not specified) and I found some different behavior.

firstly, I trained my agent with --slow option ( max steps = 1500 ) while training with that option, most of episodes are finished with either my bad-condition or good-condition. However I tried training with --fast option (or not specified) with time scale=100, --load from pre-trained agent (trained with --slow), then 100% of the episodes are terminated by exceeding max steps.

I expected my agent to do same task just faster (i.e. finish episodes either by good or bad condition, not exceeding max steps), but for now, it doesn't seem like that. (FYI, I set on-demand-decisions & put this.RequestDecision() in Update() so that it is synced with frame)

Is this supposed to be like this? and what is the possible reason of different behavior?

ervteng commented 5 years ago

Hi @Junggy, one thing that can happen with timescale is the physics becomes less accurate. Is your game very physics-dependent (e.g. collisions)? If so, since you're taking larger steps in the simulation, you might run into differences in behavior.

Junggy commented 5 years ago

@ervteng my game has no physics and purely depends on visual observation. It has to observe each frame -> take step -> update environment -> observe new frame -> take step ... Very important thing is to take step for each frame update, means environment must not be changed before step is taken. (this is reason I put RequestDecision() in Update()).

However with time-scale=100 with that RequestDecision() in Update(), (at least in unity program) frame seems to be skipped (i.e. game runs 100 times faster, but frame rate stays same as time-scale=1). Somehow seems like observation stays same, but just taking a lot of steps. Maybe this is the reason I am experiencing the different behavior ?

tomatenbrei commented 5 years ago

Disclaimer: The following statements are based on some testing, please correct me if I'm wrong.

The problem here is, that Update() does not actually scale with the timeScale. Only FixedUpdate does.

Let's assume you set the targetFrameRate to 60. As far as I know, the simulation will then try to achieve 60 Update calls per real time second and 50 FixedUpdate calls per gametime/emulated second.

This means when your game runs with timeScale 100 and you target 60 FPS (and your machine is strong enough), you will end up with only 60 Update FPS (with factor 100 this means only about 1/2 Update call per emulated second) but 50 * 100 = 5'000 FixedUpdate FPS (with factor 100 this means the targeted 50 calls per emulated second). When your machine is not able to run the scene with your provided timescale at the desired framerate, things get even worse.

In my opinion, this is one of the main flaws with the environment simulation in ml-agents. It would be really nice if there was a "virtual clock" which assures that Update/FixedUpdate are both called at some specified rate, independent from the real time and the performance of the system.

Junggy commented 5 years ago

@tomatenbrei so you mean, fixedupdate() -> synced with time scale (and also academy's step... maybe?) update() -> synced with rendering. So I put nothing in fixed update with time scale=100 & set action decision synced with update(), then agent takes step 100 times faster but with doing nothing, and only decide action when rendering is done (i.e. 60 decision per second)?

This case increasing time scale will have only effect of finishing episode early by exceeding steps, right? (i.e. time scale = 100 -> finish episode is 100 times earlier)

if so, it makes sense ...

tomatenbrei commented 5 years ago

@Junggy

The step function of the academy (MLAgents.Academy.AcademyStep) is actually called by MLAgents.Academy.EnvironmentStep which is called by FixedUpdate, so they are directly "synced".

The rendering is not tied to Update, it is actually done in FixedUpdate as well (because the visual observation is processed in MLAgents.Agent.ObservationToTexture, which is executed inside the agents observation-action step inside MLAgents.Academy.EnvironmentStep).

Maybe it already helps when you place your RequestAction somewhere inside FixedUpdate.

Junggy commented 5 years ago

@tomatenbrei

so Academy set & Obervation rendering -> synced with Fixedupdate() rendering for Scene in unity -> synced with Update() So what I see in unity program's scene (or game display) is not what the agent see, is this what you mean ?

as I want to sync action decision with frame, I un-tick on-demand-decision and set decision interval to 1 (I saw somewhere in documentation that action decision is synced with fixed update by default), target frame rate = -1 But it doesn't seems like in unity scene updates as frequent as my time scale. (It rather render nothing when I set time scale = 100. But for sure it runs. Because I changed tensorflow code to output some result at first few steps, and it does output result). This means, obeservation & steps & decision take place according to time scale, but only rendering on unity scene(or game display) get too slowed down to show any result, cause they are not synced ?

tomatenbrei commented 5 years ago

So what I see in unity program's scene (or game display) is not what the agent see, is this what you mean ?

Yes, I think this is the case. You see the regular rendering of the scene, the agents see what their cameras see during the rendering call inside FixedUpdate.

This means, obeservation & steps & decision take place according to time scale, but only rendering on unity scene(or game display) get too slowed down to show any result, cause they are not synced ?

Yes, they use the time scale. But I do not know how the rendering of the Unity Editor is handled.

Maybe for future reference, here are some numbers which illustrate what happens. I recorded the number of Update and FixedUpdate calls after 5000 FixedUpdate steps and checked what happens when I change the timeScale or targetFrameRate:

Target Frame Rate = -1

Current timeScale: 1.00
Update: 106.38 FPS (59.99 scaled) --- 5999 calls
FixedUpdate: 88.67 FPS (50.00 scaled)--- 5000 calls
Delta: 56.39, DeltaScaled: 100.00, real timeScale: 1.77

Current timeScale: 10.00
Update: 43.14 FPS (5.99 scaled) --- 599 calls
FixedUpdate: 360.08 FPS (50.00 scaled)--- 5000 calls
Delta: 13.89, DeltaScaled: 100.00, real timeScale: 7.20

Current timeScale: 100.00
Update: 6.58 FPS (0.59 scaled) --- 59 calls
FixedUpdate: 557.30 FPS (50.00 scaled)--- 5000 calls
Delta: 8.97, DeltaScaled: 100.00, real timeScale: 11.15

Target Frame Rate = 60

Current timeScale: 1.00
Update: 57.52 FPS (59.99 scaled) --- 5999 calls
FixedUpdate: 47.94 FPS (50.00 scaled)--- 5000 calls
Delta: 104.30, DeltaScaled: 100.00, real timeScale: 0.96

(other timeScale values produce identical results compared to Target Frame Rate == -1)

One can see that with increasing timeScale, the number of FixedUpdate calls stays at 50 per simulated second but the number of Update calls decreases. Note that when I set my target framerate to 60, the total number of Update / FixedUpdate calls stays the same but I need more real time (104 seconds instead of 56 seconds) until the 5000 environment steps are done.

xiaomaogy commented 5 years ago

@ervteng

harperj commented 5 years ago

Hi all -- this issue has been inactive for some time so I'm going to close it. Feel free to reopen or create a new issue if you have more to discuss.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Different behavior in --fast and --slow during training #2038