Closed linchunquan closed 2 years ago
please create a pull request, we will evaluate this "future function". Thanks.
I checked the codes, @linchunquan seems to mix the boundary of Python arrays. When you index T, the available ones are 0,..., T-1. In this case, there is no future information. If not this case, please let me know.
@BruceYanghy @XiaoYangLiu-FinRL apology for reply so late as I'm too busy... anyway, please just check from line 359 to line 367:
359 self.date_index += 1
360 if self.turbulence_threshold is not None:
361 self.turbulence = self.get_date_vector(
362 self.date_index, cols=["turbulence"]
363 )[0]
364 # Update State
365 state = (
366 [coh] + list(holdings_updated) + self.get_date_vector(self.date_index)
367 )
at line 359, date_index goes forward one step,so self.get_date_vector(self.date_index) will get the next day’s stock price and exposes this as observable information to the agent at the next step. however, in the real world,the agent should not see 3 type of prices(close/high/low)until stock market close.
the same issue is also found at the reset function, please check line#152:
152 + self.get_date_vector(self.date_index)
I think the correct observable stock information at self.date_index should be : self.get_date_vector(self.date_index - 1).
All in all, when the agent start to to make trading decision on day T, it should just observe stock prices before T.
That self.date_index += 1 is for the step function. Here, after taking an action your date index should move to the next time step.
That self.date_index += 1 is for the step function. Here, after taking an action your date index should move to the next time step.
What the issue I pointed out is not about why the date_index being increased. I just explained why I said some functions involves future information. Again, when the RL agent works on day T before stock market closing, it can not acquire close/high/low prices of day T until market closed.
Oh okay, I understood your query now. What if we say that the training agent trades at the end of the day or just 1 minute before the trading ends. One minute will not make much difference and we can include the current day observation to our state-space. I guess it will be an assumption in the methodology. What do you think @linchunquan?
Sorry, I don't think it's workable for trading at the last minutes based on previous day's close price in the real word :-) @Athe-kunal
So we take actions to trade based on the advice of our DRL Trader at the end of day at time t (time t’s close price equals time t+1’s open price). We hope that we will benefit from these actions by the end of day at time t+1.
https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-multiple-stock-trading-7b00763b7530 @linchunquan
@BruceYanghy thanks for the explanation :-) however, it seems that there're some puzzled points about the above article:
For point 2 and 3, I think it would be more realistic that we trade only one time at date T base on the information before T. And the DRL Environment/Agent should be designed/trained at the same assumptions and methodology.
I found that there's a future function at neo_finrl/env_stock_trading/env_stocktrading_cashpenalty.py. Because on the T day, the agent should only observe the stock‘s close-price of T-1 day. After I fixing this, the trading result is not good any more.