Same reward thought the training in DDPG

agent = DRLAgent(env = env_train)
DDPG_PARAMS = {
    "batch_size": 4096,
    "buffer_size": 1000000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "tau":0.02,
}

model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

#training DDPG Agent
trained_ddpg = agent.train_model(model=model_ddpg,
                             tb_log_name='ddpg',
                             total_timesteps=50000)


----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 29        |
|    time_elapsed    | 189       |
|    total_timesteps | 5608      |
| train/             |           |
|    actor_loss      | -11.6     |
|    critic_loss     | 0.0618    |
|    learning_rate   | 0.0003    |
|    n_updates       | 5507      |
|    reward          | 0.5398047 |
----------------------------------
day: 1401, episode: 10
begin_total_asset: 100000.00
end_total_asset: 259256.35
total_reward: 159256.35
total_cost: 138.56
total_trades: 72857
Sharpe: 0.778
=================================
----------------------------------
| time/              |           |
|    episodes        | 8         |
|    fps             | 29        |
|    time_elapsed    | 386       |
|    total_timesteps | 11216     |
| train/             |           |
|    actor_loss      | -3.94     |
|    critic_loss     | 0.0111    |
|    learning_rate   | 0.0003    |
|    n_updates       | 11115     |
|    reward          | 0.5398047 |
----------------------------------
----------------------------------
| time/              |           |
|    episodes        | 12        |
|    fps             | 28        |
|    time_elapsed    | 584       |
|    total_timesteps | 16824     |
| train/             |           |
|    actor_loss      | -1.22     |
|    critic_loss     | 0.0419    |
|    learning_rate   | 0.0003    |
|    n_updates       | 16723     |
|    reward          | 0.5398047 |
----------------------------------

I am trying to use DDPG for my StockTradingEnv provided by FINRL. The rewards is same for all over the episodes and also when plotting out the buys sells and holds of the stocks

df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_ddpg,
    environment = e_trade_gym)

The entire table is just 0s starting form the first row onwards the performance is way wayy worse than SAC and training the DDPG for 1000 time steps is giving same result as of training 10k time steps

Am i missing something is it with the hyper parameters ?

@ndronen @lcavalie @dubodog @kruzel

AI4Finance-Foundation / FinRL

Same reward thought the training in DDPG #1233