Closed AdrianP- closed 7 years ago
@AdrianP- , State shape is unrelated to frequency of agent taking action. It is just time-embedding dimension, like frame-stacking in atari domain, meant to help with solving temporal-dependant POMDP. Shape (1,4) means one candle, shape (10,4) - ten last candles as an observation.
There is another control for agent exactly telling how many observations to skip until taking another action (in-between assumed to be hold
). I's described in source code, line 159 in btgym/btgym/backtrader.py
:
skip_frame=None,
# Number of environment steps to skip before returning next response,
# e.g. if set to 10 -- agent will interact with environment every 10th episode step;
# Every other step agent's action is assumed to be 'hold'.
# Note: INFO part of environment response is a list of all skipped frame's info's,
# i.e. [info[-9], info[-8], ..., info[0].
)
It should be in documentation, but I just don't have time to write it properly. I'll try to fix it as soon as I can.
btgym/examples/setting_up_environment_full.ipynb
has an exampe with skipped-frames.
Sorry, I explained wrong. I get your point but,as far as my knowledge, the new observations that you get each step should be only one. That is the strategy on OpenAI baseline, because every new action save in a ReplayBuffer.
Nevertheless, the patch is trivial, so I'm going to do for my version :)
Ok!
If you think in algo-trader terms, the normal is taking an action every time that you get a new candle, so the Agent should do the same. But btgym isn't prepared for state_shape=(1, 4) There is a reason for that?