AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.09k stars 459 forks source link

Step price #70

Closed Benjy96 closed 2 years ago

Benjy96 commented 2 years ago

Regarding the current_price in _update_profit and _calculate_reward, is it setting the position of the agent to be the next tick AFTER the observation? For example, the env/agent observes a close of 10, predicts buy, and then doesn't actually buy at 10, but will buy the following tick? So on the daily timeframe it would observe 10, predict buy, but then buy the next daily close, which could be something like 11? Perhaps I'm misunderstanding. Thanks

AminHP commented 2 years ago

Yes, that's right. In a real application, the agent would buy at the next daily open price. But in this env, since everything is kept simple, only the close price is present. So the agent has to buy at the next daily close price. However, you can easily change the env in a way to buy with the next daily open price. Anyway, if you need a more precise env close to real-world applications, my gym-mtsim project is available here.

Benjy96 commented 2 years ago

Appreciate the response, thanks

Benjy96 commented 2 years ago

One final question @AminHP , if you wanted to keep it simple and just buy immediately once you've seen the close, could you just put self._current_tick += 1 at the end of the env's step function? To simulate buying as soon as you've seen the close (which yes wouldn't be too accurate in reality). I'll take a look at the other project you recommended too :) Thanks again

AminHP commented 2 years ago

You're welcome :)

That might work, but I'm not entirely sure. It should be checked carefully. However, the sequence of events in this way does not make sense to me. I mean, I see the close price of today end previous days, then I make a decision, and tomorrow I do an action (at any time of the day) according to that decision. But putting self._current_tick += 1 at the end of the function creates some kind of confusion and disorder. Anyway, that's just my thoughts and preferences and aren't necessarily true. You can implement it as you favor.

Benjy96 commented 2 years ago

Yes, I get what you mean, I was just thinking about how to make the smallest change which may work more responsively than buying the next day's close (since if we observe today but then buy a day late, we're missing a day of observation, right?).

Regarding your suggestion to alter the env to use the next Open, I suppose I could add another prices array to the trading_env containing each Open and change step, calculate_reward, and update_profit to access that.

AminHP commented 2 years ago

Yes, I know. I'm just not sure if that change works because some other things in the env are dependent on the current_tick value and they may fail to work properly.

That's what I would do.

Benjy96 commented 2 years ago

Ok, thanks for your help