AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
10.06k stars 2.42k forks source link

Some code questions #465

Closed ghaffari903 closed 2 years ago

ghaffari903 commented 2 years ago

Again, thanks for your implementations, I study codes, I have some questions, thanks someone for writing the answer.

1-What is the use of reward scaling? 2- In Step functionhttps://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/finrl_meta/env_stock_trading/env_stock_trading.py : min_stock_rate and min_action whats for? in trading we see qty < min_action, I did not understand what it is used for?

3- I printed real actions in this code, its count is 0 most of time, but reward is applied to amount!!!! if self.turbulence_bool[self.day] == 0: min_action = int(self.max_stock * self.min_stock_rate) # stock_cd for index in np.where(actions < -min_action)[0]: # sell_index: if price[index] > 0: # Sell only if current asset is > 0 sell_num_shares = min(self.stocks[index], -actions[index]) self.stocks[index] -= sell_num_shares self.amount += ( price[index] sell_num_shares (1 - self.sell_cost_pct) ) self.stocks_cool_down[index] = 0 for index in np.where(actions > min_action)[0]: # buy_index: if ( price[index] > 0 ): # Buy only if the price is > 0 (no missing data in this particular date) buy_num_shares = min(self.amount // price[index], actions[index]) self.stocks[index] += buy_num_shares self.amount -= ( price[index] buy_num_shares (1 + self.buy_cost_pct) ) self.stocks_cool_down[index] = 0`

Athe-kunal commented 2 years ago

Hi @ghaffari903

  1. (Not very sure)The reward scaling is due to slippage. Actually, the price at which you initiate the trade and by the time it is executed, there are some changes in the price and you don't generally transact at the exact same price that you initiated your transaction. To accommodate it, reward_scaling is used. Read more about slippage here
  2. Min_action and Min_action_rate can be used to determine whether to do a transaction or not. If you see in np.where the () function for lines 109 and 117, the agent will transact only if the order size is more than min_action. If you have a higher min_action, then you have high resistance to false signal and only if the order size is high, you will execute the transaction else you won't. Min_stock_rate is used to scale the maximum number of stocks (self.max_stock) you can transact at one go
  3. Can you further explain this question?
ghaffari903 commented 2 years ago

@ Athe-kunal Thank you for detailed response,

  1. Great
  2. when I print order size (sell_num_shares or buy_num_shares), they are less than min_action (it is 10 and qty is less than10)!!! line 111 has contradiction to 109 (same as 121 and 117)
  3. when you print actual actions in testing result, see without any trading action the result has changed!!! It seems that changing the reward and performing the transactions are not compatible
Athe-kunal commented 2 years ago
  1. If the action is 0, then it is Buy and Hold. So the reward is the return of this strategy. The stocks that you hold also appreciate value, that reward corresponds to that daily return
  2. Selling actions are less than 0 and buying actions are more than 0
ghaffari903 commented 2 years ago

@Athe-kunal You mentioned very good points. You are wonderful!

  1. Selling actions are less than 0 and buying actions are more than 0 Yes, I know, my confusion was about changing reward without any action! It seems to be solved.
ghaffari903 commented 2 years ago

@Athe-kunal Dear Astarag, what do you think about the bad test results on best tuned model? Although we make the best model on the training data, but I do not get a good result in the test. Some answers are very good(70-80% better than index) but some are disappointing(-50% under index backtest result!).

Athe-kunal commented 2 years ago

Yeah, the RL agent is really sensitive and every time the initial seed value changes, the whole trajectory changes and we don't get very conclusive evidence. Also, the training data distribution is different to test data distribution, like the training model on Tesla before 2020, and testing it afterward may lead to sub-optimal results. So we can work on augmenting the data in the train set to capture these variations. Also, I am closing this issue, if you have any further, please feel free to re-open