AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.65k stars 2.34k forks source link

Model 4 Sac is not being trained correctly #55

Closed amihos closed 2 years ago

amihos commented 3 years ago

in FinRL_single_stock_trading.ipynb

after finishing training Model 4 Sac I am getting this:

Model 4: SAC agent = DRLAgent(env = env_train) SAC_PARAMS = { "batch_size": 128, "buffer_size": 100000, "learning_rate": 0.00003, "learning_starts": 100, "ent_coef": "auto_0.1", } ​ model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS) {'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 3e-05, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=30000) Logging to tensorboard_log/sac/sac_2

| time/ | | | episodes | 4 | | fps | 81 | | time_elapsed | 123 | | total timesteps | 10064 | | train/ | | | actor_loss | -940 | | critic_loss | 1.6 | | ent_coef | 0.135 | | ent_coef_loss | 19 | | learning_rate | 3e-05 | | n_updates | 9963 |

day: 2515, episode: 100 begin_total_asset:100000.00 end_total_asset:100000.00 total_reward:0.00 total_cost: 0.00 total_trades: 0


| time/ | | | episodes | 8 | | fps | 80 | | time_elapsed | 250 | | total timesteps | 20128 | | train/ | | | actor_loss | -510 | | critic_loss | 147 | | ent_coef | 0.182 | | ent_coef_loss | 15.8 | | learning_rate | 3e-05 | | n_updates | 20027 |

BruceYanghy commented 3 years ago

Can you try turning down the "learning_rate" from 0.00003 to 0.00001, and run again? I noticed that this happens with single stock trading, mostly happened with DDPG, TD3, and SAC, never happened to PPO or A2C.

amihos commented 3 years ago

Nothing changed looks like:

Model 4: SAC agent = DRLAgent(env = env_train) SAC_PARAMS = { "batch_size": 128, "buffer_size": 100000, "learning_rate": 0.00001, "learning_starts": 100, "ent_coef": "auto_0.1", } ​ model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS) {'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 1e-05, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=30000) Logging to tensorboard_log/sac/sac_3

| time/ | | | episodes | 4 | | fps | 86 | | time_elapsed | 116 | | total timesteps | 10064 | | train/ | | | actor_loss | 61.4 | | critic_loss | 28.2 | | ent_coef | 0.11 | | ent_coef_loss | 20.9 | | learning_rate | 1e-05 | | n_updates | 9963 |

day: 2515, episode: 110 begin_total_asset:100000.00 end_total_asset:100000.00 total_reward:0.00 total_cost: 0.00 total_trades: 0


| time/ | | | episodes | 8 | | fps | 85 | | time_elapsed | 236 | | total timesteps | 20128 | | train/ | | | actor_loss | 77.8 | | critic_loss | 0.762 | | ent_coef | 0.122 | | ent_coef_loss | 20 | | learning_rate | 1e-05 | | n_updates | 20027 |

amihos commented 3 years ago

Now that you mentioned I checked again the results from other tests and I confirm that the same thing happened for test DDPG but not TD3!

mokzheen commented 2 years ago

Hi i have the same problem as well, is there any way to solve this?

YangletLiu commented 2 years ago

We updated a lot of Codes. This issue does not exist now.

tzjZhengJie commented 7 months ago

I'm still getting the same issue, where the total_trades, assets and reward remains the same.