Closed tengyaolong2000 closed 3 months ago
Technically, you only need to use Softmax once to get the portfolio weights.
However, during training, we found that the PnL fluctuations are too big, and the agent finds it very hard to converge. This is due to the high stochasticity in the market. Applying Softmax twice will somewhat make the weights more even, and therefore, the PnL will not fluctuate too much, making it easier for RL agents to converge.
In short, it is a compromise due to the previous methods' inability to handle a high stochastic environment. You can remove this if your algorithms can handle the fluctuations.
Softmax is applied on action,
https://github.com/TradeMaster-NTU/TradeMaster/blob/bc5a30a2ec07a65384cc74c0f46c4c34114ea25e/trademaster/trainers/portfolio_management/trainer.py#L149
then in, https://github.com/TradeMaster-NTU/TradeMaster/blob/bc5a30a2ec07a65384cc74c0f46c4c34114ea25e/trademaster/environments/portfolio_management/environment.py#L125
softmax is applied again to transfer action into portfolio weights. Is there a specific reason why this is done? Thanks for your time