veithly commented 1 year ago

Describe the bug When I train the model, I can give the results of each training output, but when I predict, the model does not execute transactions

To Reproduce Steps to reproduce the behavior:

train code

PPO_model_kwargs = { "n_steps": 512, "ent_coef": 0.002, "learning_rate": 0.00025, "batch_size": 1024, "device": "cuda", } ppo_model = agent.get_model("ppo", model_kwargs=PPO_model_kwargs, seed=0, tensorboard_log="PPO_VOTE_temp")

train_model = agent.train_model(model=ppo_model, total_timesteps=len(macro_df) * 8000, tb_log_name="PPO_VOTE_temp")

train output

================================= day: 45, episode: 8204 begin_total_asset: 100204894.97 end_total_asset: 110930148.45 total_reward: 10725253.48 total_cost: 9998.75 total_trades: 41 Sharpe: 4.197

| time/ | | | fps | 290 | | iterations | 719 | | time_elapsed | 1267 | | total_timesteps | 368128 | | train/ | | | approx_kl | 0.0006842093 | | clip_fraction | 0 | | clip_range | 0.2 | | entropy_loss | -1.24 | | explained_variance | -0.0117 | | learning_rate | 0.00025 | | loss | 4.5e+04 | | n_updates | 7180 | | policy_gradient_loss | 0.0012 | | reward | 11.44137 | | std | 0.837 | | value_loss | 9e+04 |

predicting code

e_trade_gym = StockTradingEnv(df=macro_df, num_stock_shares=num_stock_shares, **env_kwargs) df_account_value, df_actions = DRLAgent.DRL_prediction( model=train_model, environment=e_trade_gym, ) df_actions

predicting output

day: 45, episode: 2 begin_total_asset: 100189432.53 end_total_asset: 100000000.00 total_reward: -189432.53 total_cost: 0.00 total_trades: 0 Sharpe: -2.366

hit end!

0,2022-11-15,[0] 1,2022-11-16,[0] 2,2022-11-17,[0] 3,2022-11-18,[0] 4,2022-11-21,[0] 5,2022-11-22,[0] 6,2022-11-23,[0] 7,2022-11-24,[0] 8,2022-11-25,[0] 9,2022-11-28,[0] 10,2022-11-29,[0] ....

Athe-kunal commented 1 year ago

Hi @veithly Can you tell me which environment are you using? Because I was also getting the same issue with an environment

veithly commented 1 year ago

@Athe-kunal Like this, I'm at tech indicator list section adds some other parameters for reference

from numpy import random as rd

stock_dimension = len(macro_df.tic.unique()) state_space = 1 + 2 stock_dimension + len(teach_indicator_list) stock_dimension num_stock_shares = [0] * stock_dimension

num_stock_shares = (np.array(num_stock_shares) + rd.randint(10, 64, size=np.array(num_stock_shares).shape)).astype(np.int32) env_kwargs = { "hmax": 10000 10000, "initial_amount": 10000 10000, "buy_cost_pct": [1/10000] stock_dimension, "sell_cost_pct": [1.1/1000] stock_dimension, "state_space": state_space, "stock_dim": stock_dimension, "tech_indicator_list": teach_indicator_list, "action_space": stock_dimension, "reward_scaling": 1e-4, "print_verbosity": 1, }

AI4Finance-Foundation / FinRL

Able to train, but output actions are not executed when predicting the model #873