AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
10.09k stars 2.43k forks source link

ElegantRL Agent Outputs Inconsistent Actions #1085

Open julianzero opened 1 year ago

julianzero commented 1 year ago

An agent trained via ElegantRL, given the same input state, outputs different and even seemingly random actions each time of predictions. Shouldn't an agent output deterministic actions after learning? Is this a bug?

Thanx.

mmmarchetti commented 1 year ago

The initial state of the model is initialized randomly. This randomness can lead to inconsistent results across different runs, especially if no random seed is provided in the code.

To get consistent results, one way is to control this randomness by setting a fixed seed value. By doing this, the random processes will always produce the same result, making our results reproducible across different runs. Here's a way to fix this, using the agent.get_model from DRLAgent:

agent = DRLAgent(env=env_train)
model_a2c = agent.get_model(model_name="a2c", model_kwargs=params, seed=1)

By setting seed=1, the randomness behind the model's initial state is controlled, ensuring that you get the same result every time you run the notebook, given that all other parameters and data remain the same.

Of course, setting a seed is just one step. To achieve maximum profit, you might still need to fine-tune other hyperparameters and ensure the model doesn't underfit or overfit the data.

Hope this helps!

julianzero commented 1 year ago

The initial state of the model is initialized randomly. This randomness can lead to inconsistent results across different runs, especially if no random seed is provided in the code.

To get consistent results, one way is to control this randomness by setting a fixed seed value. By doing this, the random processes will always produce the same result, making our results reproducible across different runs. Here's a way to fix this, using the agent.get_model from DRLAgent:

agent = DRLAgent(env=env_train)
model_a2c = agent.get_model(model_name="a2c", model_kwargs=params, seed=1)

By setting seed=1, the randomness behind the model's initial state is controlled, ensuring that you get the same result every time you run the notebook, given that all other parameters and data remain the same.

Of course, setting a seed is just one step. To achieve maximum profit, you might still need to fine-tune other hyperparameters and ensure the model doesn't underfit or overfit the data.

Hope this helps!

Thanx a lot. So is the randomness part of the trained models now? To fix it, do I need to re-train the models?

mmmarchetti commented 1 year ago

Thank you for reaching out! It's important to note that randomness is inherent in all deep learning models and is not a bug that requires fixing. Rather, it's a characteristic of the algorithm. To achieve consistent results, you can retrain the model with a seed. To learn more about this, please visit this.

julianzero commented 1 year ago

Thank you for reaching out! It's important to note that randomness is inherent in all deep learning models and is not a bug that requires fixing. Rather, it's a characteristic of the algorithm. To achieve consistent results, you can retrain the model with a seed. To learn more about this, please visit this.

Thanks a lot for your help! As far as I know, a trained model via FinRL/ElegantRL can only output an action, i.e., trading quantity, given a state, i.e., the close price of a day; what about the next state or the price of the next day? Is there a way to predict the price of the next day or the next state using FinRL?

mmmarchetti commented 1 year ago

When you input data from the previous day or the market's closing moment into the predictive model, it generates an output suggesting the action to take for the following day.

julianzero commented 1 year ago

When you input data from the previous day or the market's closing moment into the predictive model, it generates an output suggesting the action to take for the following day.

TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001, "n_steps": 1024, "gamma": 0.99, "seed": 1, "net_dimension": 96, "target_step": 1000, "eval_gap": 6, "eval_times": 2 }

model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS) cwd = env_kwargs.get('cwd','./trained_models'+ '/MSFT0' + '/TD3/') trained_td3 = agent.train_model(model=model_td3, cwd=cwd, total_timesteps=300)

agent = DRLAgent(env = env, price_array = price_array, tech_array = tech_array, turbulence_array = turbulence_array)

account_value, actions_done = DRLAgent.DRL_prediction( model_name= "td3", cwd= TRAINED_MODEL_DIR + '/MSFT{}/'.format(0), net_dimension= 96, environment = env_instance)

I still got inconsistent actions as outputs... Could you please help check what was wrong?

Many thanx.

mmmarchetti commented 1 year ago

It seems like we will get consistent outputs. To confirm, we need to perform some tests.