TradeMaster-NTU / TradeMaster

TradeMaster is an open-source platform for quantitative trading empowered by reinforcement learning :fire: :zap: :rainbow:
Apache License 2.0
1.38k stars 284 forks source link

why is softmax applied twice when actions are transferred to portfolio weights? #196

Closed tengyaolong2000 closed 3 months ago

tengyaolong2000 commented 9 months ago

Softmax is applied on action,

https://github.com/TradeMaster-NTU/TradeMaster/blob/bc5a30a2ec07a65384cc74c0f46c4c34114ea25e/trademaster/trainers/portfolio_management/trainer.py#L149

then in, https://github.com/TradeMaster-NTU/TradeMaster/blob/bc5a30a2ec07a65384cc74c0f46c4c34114ea25e/trademaster/environments/portfolio_management/environment.py#L125

softmax is applied again to transfer action into portfolio weights. Is there a specific reason why this is done? Thanks for your time

qinmoelei commented 3 months ago

Technically, you only need to use Softmax once to get the portfolio weights.

However, during training, we found that the PnL fluctuations are too big, and the agent finds it very hard to converge. This is due to the high stochasticity in the market. Applying Softmax twice will somewhat make the weights more even, and therefore, the PnL will not fluctuate too much, making it easier for RL agents to converge.

In short, it is a compromise due to the previous methods' inability to handle a high stochastic environment. You can remove this if your algorithms can handle the fluctuations.