AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
MIT License
9.36k stars 2.27k forks source link

Same reward thought the training in DDPG #1233

Open Siddhu2502 opened 1 month ago

Siddhu2502 commented 1 month ago
agent = DRLAgent(env = env_train)
    "batch_size": 4096,
    "buffer_size": 1000000,
    "learning_rate": 0.0003,
    "learning_starts": 100,

model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

#training DDPG Agent
trained_ddpg = agent.train_model(model=model_ddpg,

| time/              |           |
|    episodes        | 4         |
|    fps             | 29        |
|    time_elapsed    | 189       |
|    total_timesteps | 5608      |
| train/             |           |
|    actor_loss      | -11.6     |
|    critic_loss     | 0.0618    |
|    learning_rate   | 0.0003    |
|    n_updates       | 5507      |
|    reward          | 0.5398047 |
day: 1401, episode: 10
begin_total_asset: 100000.00
end_total_asset: 259256.35
total_reward: 159256.35
total_cost: 138.56
total_trades: 72857
Sharpe: 0.778
| time/              |           |
|    episodes        | 8         |
|    fps             | 29        |
|    time_elapsed    | 386       |
|    total_timesteps | 11216     |
| train/             |           |
|    actor_loss      | -3.94     |
|    critic_loss     | 0.0111    |
|    learning_rate   | 0.0003    |
|    n_updates       | 11115     |
|    reward          | 0.5398047 |
| time/              |           |
|    episodes        | 12        |
|    fps             | 28        |
|    time_elapsed    | 584       |
|    total_timesteps | 16824     |
| train/             |           |
|    actor_loss      | -1.22     |
|    critic_loss     | 0.0419    |
|    learning_rate   | 0.0003    |
|    n_updates       | 16723     |
|    reward          | 0.5398047 |

I am trying to use DDPG for my StockTradingEnv provided by FINRL. The rewards is same for all over the episodes and also when plotting out the buys sells and holds of the stocks

df_account_value, df_actions = DRLAgent.DRL_prediction(
    environment = e_trade_gym)

The entire table is just 0s starting form the first row onwards the performance is way wayy worse than SAC and training the DDPG for 1000 time steps is giving same result as of training 10k time steps

Am i missing something is it with the hyper parameters ?

@ndronen @lcavalie @dubodog @kruzel