AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
10.2k stars 2.45k forks source link

there's a future function at neo_finrl/env_stock_trading/env_stocktrading_cashpenalty.py #317

Closed linchunquan closed 2 years ago

linchunquan commented 3 years ago

I found that there's a future function at neo_finrl/env_stock_trading/env_stocktrading_cashpenalty.py. Because on the T day, the agent should only observe the stock‘s close-price of T-1 day. After I fixing this, the trading result is not good any more.

BruceYanghy commented 3 years ago

please create a pull request, we will evaluate this "future function". Thanks.

YangletLiu commented 3 years ago

I checked the codes, @linchunquan seems to mix the boundary of Python arrays. When you index T, the available ones are 0,..., T-1. In this case, there is no future information. If not this case, please let me know.

linchunquan commented 3 years ago

@BruceYanghy @XiaoYangLiu-FinRL apology for reply so late as I'm too busy... anyway, please just check from line 359 to line 367:

359           self.date_index += 1
360           if self.turbulence_threshold is not None:
361                self.turbulence = self.get_date_vector(
362                    self.date_index, cols=["turbulence"]
363                )[0]
364            # Update State
365            state = (
366                [coh] + list(holdings_updated) + self.get_date_vector(self.date_index)
367           )

at line 359, date_index goes forward one step,so self.get_date_vector(self.date_index) will get the next day’s stock price and exposes this as observable information to the agent at the next step. however, in the real world,the agent should not see 3 type of prices(close/high/low)until stock market close.

the same issue is also found at the reset function, please check line#152:

152           + self.get_date_vector(self.date_index)

I think the correct observable stock information at self.date_index should be : self.get_date_vector(self.date_index - 1).

All in all, when the agent start to to make trading decision on day T, it should just observe stock prices before T.

Athe-kunal commented 3 years ago

That self.date_index += 1 is for the step function. Here, after taking an action your date index should move to the next time step.

linchunquan commented 3 years ago

That self.date_index += 1 is for the step function. Here, after taking an action your date index should move to the next time step.

What the issue I pointed out is not about why the date_index being increased. I just explained why I said some functions involves future information. Again, when the RL agent works on day T before stock market closing, it can not acquire close/high/low prices of day T until market closed.

Athe-kunal commented 3 years ago

Oh okay, I understood your query now. What if we say that the training agent trades at the end of the day or just 1 minute before the trading ends. One minute will not make much difference and we can include the current day observation to our state-space. I guess it will be an assumption in the methodology. What do you think @linchunquan?

linchunquan commented 3 years ago

Sorry, I don't think it's workable for trading at the last minutes based on previous day's close price in the real word :-) @Athe-kunal

BruceYanghy commented 3 years ago

image image

So we take actions to trade based on the advice of our DRL Trader at the end of day at time t (time t’s close price equals time t+1’s open price). We hope that we will benefit from these actions by the end of day at time t+1.

https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-multiple-stock-trading-7b00763b7530 @linchunquan

linchunquan commented 3 years ago

@BruceYanghy thanks for the explanation :-) however, it seems that there're some puzzled points about the above article:

  1. As we trained the DRL Agent with k-lines-by-date, would it be suitable to trade in the real world basing on k-lines-by-minute? (Assumes that the time interval between t and t+1 mentioned in the above article is one minute)
  2. I think the higher transaction frequency the harder to make each transaction success. If some transactions failed, how to keep following up the DRL Agent time by time?
  3. The trading frequency of some markets around the world is T+1. It means that we cannot buy and sell the same amount of the same stock within on day.

For point 2 and 3, I think it would be more realistic that we trade only one time at date T base on the information before T. And the DRL Environment/Agent should be designed/trained at the same assumptions and methodology.