Open BruceYanghy opened 2 years ago
I made the same observation on my real life testing. One explanation I heard is that if during the training period there is no benefit in selling the model will just not learn to do it because it’s not beneficial. The other ‘issue’ is that over the last few years the market has been in an almost constant uptrend, so a buy and hold strategy may simply be the most effective, more beneficial then multiple trades back and forth in and out of a position. I wonder if there is a link between the interval used to train the AI and the interval later used for paper trading (1 minute).
L
On Jan 16, 2022, at 6:20 AM, Bruce Yang @.***> wrote:
https://user-images.githubusercontent.com/31713746/149661247-0525bf74-966a-4a01-89e3-4c31d9379865.png We found that during the backtesting period of the StockTrading demo, the number of stock adjustments was very small. Most stocks established a large position in the early stage of trading, and then the position was hardly adjusted. Very little repositioning. As the time increases, our portfolio has no different than the market index.
This picture shows the stock price (left column) and position (right column) of the backtest range we are looking at. Different colored lines in the right column represent the results of multiple trials. x-axis represents the time. We observe that many stocks are rarely rebalanced (3-year backtest interval).
We are asking for discussion to make the RL agent increase the number of handovers during the development process.
Thanks.
— Reply to this email directly, view it on GitHub https://github.com/AI4Finance-Foundation/FinRL/issues/432, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABC5C7SWZAS3LTGNV6U73DUWLA2HANCNFSM5MCYOM3Q. You are receiving this because you are subscribed to this thread.
能否通过对 奖励函数(Reward Function)的定义来改变智能体(Agent)的行为(Action),因为总体上股价是呈上升趋势的,所以智能体只要一开始选择持有股票,到回合结束后也能获得不错的收益(奖励 Reward),因此如果在智能体【持有股票】的过程中会 持续的、定期的 扣取 管理费用,以此来减少智能体长期持有股票的累计奖励,激励智能体 能够 稍微频繁 交易,通过赚取少量的差价,来增大 在回合内 赚取的 钱呢? ------- 在训练环境中定义
通过训练环境中对 奖励函数 的定义,而测试环境中不受到 扣取 持有股票管理费的限制,能否在测试环境中,改变智能体的行为,智能体不会总是长期持有股票,而是会中间过程中,能在预测到股票价格下跌的情况下,会选择抛售股票,再等预测到股票价格上涨的时机时,再买入来赚取更多的差价呢?
以上是我的设想,还需要后续实验的验证
Why this issue is closed.
I also find that the agent will no longger trading after buy a large position in the early stage of trading.
I also tried to change the fee to 0, in case that the agent will be more willing to do more trading, but in vain. In this case, the agent seems not to learn any reasonable strategy.
Now we have more discussions, since you joined. Let me reopen it. LoL.
I guess that there are 2 reasons. Firstly, the factors can't explain the return very well, so the system fears to make transactions, it will hold the stocks because their price will rise in the long term. Secondly, the system should encourage to make transactions. We can update the environment, like @A5230171 said, to charge a fee for holding behavior.
I guess that there are 2 reasons. Firstly, the factors can't explain the return very well, so the system fears to make transactions, it will hold the stocks because their price will rise in the long term. Secondly, the system should encourage to make transactions. We can update the environment, like @A5230171 said, to charge a fee for holding behavior.
I tried to tuned the fee to 0 and add more factors. The agent still fears to make transactions.
Charge a fee for holding behavior seems to be a good idea.
Did anybody manage to overcome this
Any update on this issue @BruceYanghy ?
We found that during the backtesting period of the StockTrading demo, the number of stock adjustments was very small. Most stocks established a large position in the early stage of trading, and then the position was hardly adjusted. Very little repositioning. As the time increases, our portfolio has no different than the market index.
This picture shows the stock price (left column) and position (right column) of the backtest range we are looking at. Different colored lines in the right column represent the results of multiple trials. x-axis represents the time. We observe that many stocks are rarely rebalanced (3-year backtest interval).
We are asking for discussion to make the RL agent increase the turnover rate/repositioning frequency during the trading process.
Thanks.