Discussion: Long Episode Duration

JaCoderX commented 4 years ago

Follow up discussion to #131:

Until now I was working with a couple of data streams from the same timeframe of 1 min. but recently I had an idea to incorporate a different data stream (data as information) with timeframe of 1 day.

First stage was to be able to work with two timeframes, now the question is how to get value out of it. having more then one timeframe (especially with a big time gap) require a slightly different mindset. To make the best use of the longer timeframe data, I expended the episode_duration param drastically.

@Kismuz, I got a couple of question regarding the use of a long episode duration:

Does working with longer episode affect the policy updates frequency? in other words, is the policy update occur after x number of env steps? or it is somehow bound to episode termination? what is the param that control it?
Usually to see how the learning is progressing I'm looking at the episode actions in tensorboard. but on large duration it become unreadable (too dense). I've tried changing render_size_episode but it doesn't help a lot. Is there a better way to view it or some trick on how to to make it high resolution?

Kismuz commented 4 years ago

@JacobHanouna ,

Does working with longer episode affect the policy updates frequency?

No, policy updates frequency controlled by rollout_length parameter: https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/aac.py#L80

Is there a better way to view it or some trick on how to to make it high resolution?

render_size does affect image size, but doesn't affect it's presentation in tensorboard; one possible way to get it bigger is to hack rendering module and use functionality to save image to disk along with passing it to tensorboard (this is builtin backtrader feature; it has been disabled due to execution time concerns): https://github.com/Kismuz/btgym/blob/master/btgym/rendering/plotter.py#L26

JaCoderX commented 4 years ago

@Kismuz,

policy updates frequency controlled by rollout_length parameter

Can you hypothesizes what would be the impact of learning when choosing different rollout_length values? I'm trying to understand how to best use the long 1-day 'data as info'? or maybe with some form of 'long term auxiliary' reward?

Kismuz commented 4 years ago

@JacobHanouna , The impact is two-fold: first is related to bellman update and value estimation and second about algorithm implementation:

Theoretically bellman update should be insensitive to rollout length; in practice it does - should be at least several (4-5) steps to get real reward until we append recursive tail value based on our current V estimation;
Policy update frequency: we run SGD step once rollout is finished. If we set rollout about 100 lngth - our estimate of V would be very accurate but convergence (optimisation ) speed will be at least 10x slower vs 10 step rollout. Really even worse considering the fact that in 10 step rollout experiences starting form 11 will be collected under already updated policy (and improved) after first roll, and so on; while in longer case one hundred experiences will be collected under same policy.
Estimator implementation. When we use RNN-based estimators on should remember than we actually (and silently) use truncated version of BPTT (we pass rnn state from rollout to rollout but do not allow backward gradient flow between rolls) winch is known to be biased: it favors short-length dependencies in expense of long-range ones.

Roll of size 5 if ok for atari games but can be insufficient to learn more complex patterns. 20 is a good starting point.

Kismuz / btgym

Discussion: Long Episode Duration #132