Closed snafu4 closed 2 years ago
clarification of my question
import gym
from gym_mtsim import (
Timeframe, SymbolInfo,
MtSimulator, OrderType, Order, SymbolNotFound, OrderNotFound,
MtEnv,
FOREX_DATA_PATH, STOCKS_DATA_PATH, CRYPTO_DATA_PATH, MIXED_DATA_PATH,
)
from stable_baselines3 import A2C
env = gym.make('forex-hedge-v0')
model = A2C('MultiInputPolicy', env, verbose=0)
model.learn(total_timesteps=1000)
observation = env.reset()
while True:
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
if done:
break
modelName = 'model_test'
model.save(f'{modelName}.zip')
env.render('advanced_figure', time_format="%Y-%m-%d")
###
# running the loop below should not result in different results each time, should it...
# ... but it does! What am I missing here?
###
model = A2C.load(modelName)
for i in range(5):
observation = env.reset()
while True:
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
if done:
break
env.render('advanced_figure', time_format="%Y-%m-%d")
'''
model.predict(observation, deterministic=True) appears to fix the problem
see https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html for specifics
Any ideas why the code below (it is basically the same as the code in the README.md file) would return different results each time it is run after the model has been trained? The trained model is no longer being updated and the input (trades) do not change between runs.
` model = A2C.load(modelName, env)
`