why is softmax applied twice when actions are transferred to portfolio weights?

TradeMaster-NTU / TradeMaster

TradeMaster is an open-source platform for quantitative trading empowered by reinforcement learning :fire: :zap: :rainbow:

Apache License 2.0

1.38k stars 284 forks source link

Technically, you only need to use Softmax once to get the portfolio weights.

However, during training, we found that the PnL fluctuations are too big, and the agent finds it very hard to converge. This is due to the high stochasticity in the market. Applying Softmax twice will somewhat make the weights more even, and therefore, the PnL will not fluctuate too much, making it easier for RL agents to converge.

In short, it is a compromise due to the previous methods' inability to handle a high stochastic environment. You can remove this if your algorithms can handle the fluctuations.

TradeMaster-NTU / TradeMaster

why is softmax applied twice when actions are transferred to portfolio weights? #196