Closed Alex2782 closed 5 months ago
Hi @Alex2782 , is this PR ready to review?
The render
function is not yet compatible with SB3 + Gymnasium.
I don't know how. advanced_figure
does not fit into the concept of gymnasium.
train_SB3_gymnasium.py
(without render
) shows only the training result at the end
@AminHP
5 months ago I already changed your 'README.ipynb'. I find 'Jupyter Notebook' terrible. I can no longer track what I have customized and what changes are still needed.
You could use VSCode for editing the notebooks. It highlights the changes and provides some other useful features.
I'll check the render function.
i just added new comment here #new line
, VS code changes the whole file after save
my VS Code Extensions
ok the changes look like this, but you have to scroll a lot and look closely, is not so well solved.
The arrows to navigate through the changes do not work either
it is enough only if ipython is executed, then already very much changes at the structure.
I have found my changes, but you have to search longer than usual, some changes are not comprehensible.
It seems that Gymnasium doesn't allow passing additional args to the render method. I managed to fix it by the below code, however, I don't know if it is a correct solution based on Gymnasium rules.
import numpy as np
import random
import torch
from stable_baselines3 import A2C
import gymnasium as gym
from gym_mtsim import (
Timeframe, SymbolInfo,
MtSimulator, OrderType, Order, SymbolNotFound, OrderNotFound,
MtEnv,
FOREX_DATA_PATH, STOCKS_DATA_PATH, CRYPTO_DATA_PATH, MIXED_DATA_PATH,
)
env_name = 'forex-hedge-v0'
env = gym.make(env_name)
# reproduce training and test
seed = 42
env.reset(seed=seed)
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
model = A2C('MultiInputPolicy', env, verbose=0)
model.learn(total_timesteps=10000)
observation, info = env.reset(seed=seed)
while True:
action, _states = model.predict(observation)
observation, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
if done:
break
env.unwrapped.render('advanced_figure', time_format='%Y-%m-%d')
I think that is okay
https://gymnasium.farama.org/api/env/#gymnasium.Env.render
Compute the render frames
is not possible in project mtsim
-> advanced_figure
Great! Let's continue with this then. Thanks @Alex2782 for checking the docs.
Hi @Alex2782 When do you think you will have the time to finish this PR?
Hi @AminHP,
which changes exactly are still necessary? only adjust the examples?
Yes, the examples should probably use the changes we discussed in the above code. It is also better to have them in ipynb format, like the one in anytrading.
Hi @AminHP,
I have checked and adjusted your file 'README.ipynb'.
Note: without the first change this error comes, it may also be due to my python or numpy version.
I think the documentation would also have to be adapted, for example there is still the file README.md
. Currently I have unfortunately no time to check it more closely.
My example train_SB3_gymnasium.py
does not use 'render()'. In the 'matplotlib' output only one model comparison is shown: random actions, A2C and PPO.
Thanks for the changes. I will update and revise the docs later.
The last change is providing the examples in ipynb format.
Any updates @Alex2782 ?
Should I remove the file 'train_SB3_gymnasium.py'? I think the 'README.ipynb' is enough. Currently I have no time to convert my example to ipynb.
No, there is no need to remove the example. I will try to change it to ipynb version. The example is beneficial as it compares A2C and PPO.
@AminHP https://github.com/AminHP/gym-anytrading/blob/master/examples/SB3_a2c_ppo.ipynb
I had 'train_SB3_gymnasium.py' also in the project 'gym-anytrading' to test 'gymnasium' support.
hello ✋🏿 any updates on the merge of this feature?
I have finally fixed the minor issues, updated the examples, and merged the PR :)) Thanks a lot @Alex2782 for your contribution.
thanks guys, appreciate the effort 👍🏿
The render() function is not yet compatible with SB3 + Gymnasium. (https://github.com/DLR-RM/stable-baselines3/pull/1327)
Training works with: A2C, PPO, RecurrentPPO, TRPO (added gym-mtsim/examples/train_SB3_gymnasium.py)
Random actions vs. SB3 - Agents x [50K, 250K, 500K] learning_timesteps