update from gym to gymnasium

Alex2782 commented 1 year ago

I haven't tested everything yet: 'stocks-v0', render_mode="human" works already.

There are still 2 warnings:

/opt/homebrew/lib/python3.11/site-packages/gymnasium/utils/passive_env_checker.py:35: 
UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). 
We recommend flattening the observation to have only a 1D vector 
or use a custom policy to properly process the data. Actual observation shape: (10, 2)

/opt/homebrew/lib/python3.11/site-packages/gymnasium/utils/passive_env_checker.py:203: 
UserWarning: WARN: Future gymnasium versions will require that `Env.reset` can be passed `options` 
to allow the environment initialisation to be passed additional information.

elliottower commented 1 year ago

The warning for flattened observations can be ignored, more of a best practices for RL algorithms thing rather than an actual requirement.

Do you know what options are passed to reset()? If possible it’s best to change that for future compatibility, as the message says. Seeding can be done via the reset(seed) argument (directly calling seed() is deprecated)

Alex2782 commented 1 year ago

In the 2nd part I tried "reinforce"-example with 'gym-anytrading' and added more changes.

https://gymnasium.farama.org/tutorials/training_agents/reinforce_invpend_gym_v26/

In the next days I try to insert more examples and check more precisely some calculations 'max_possible_profit' / 'avg_profit'. 'total_reward' is often positive but 'profit' is less than 1 (profit loss?)

Alex2782 commented 1 year ago

I try to take 'observation' and 'rewards' functions from 'gym-mtsim', with 'SB3' the models are not trained properly.

The 'evaluate' and 'predict' functions always return the same value (all the time only 0 or 1) I also tried in a virtual environment, original versions without 'gymnasium' support, the results were identical.

https://github.com/DLR-RM/stable-baselines3/pull/1327 (+ 'gymnasium' support)

I could no longer successfully install 'stable_baselines' (as in the example https://github.com/AminHP/gym-anytrading/blob/master/examples/a2c_quantstats.ipynb) under macOS 13 for comparison.

'SB3'-predict: A2C or PPO

https://github.com/Alex2782/gym-anytrading/blob/master/examples/SB3_quantstats.ipynb

sb3_bug

Alex2782 commented 1 year ago

'observation_space' is now initiated with 'INF = 1e10', like in the 'gym-mtsim' project (no longer with 'np.inf') gym_anytrading/envs/trading_env.py

'SB3'-predict: A2C looks better now https://github.com/Alex2782/gym-anytrading/blob/master/examples/SB3_quantstats.ipynb Bildschirmfoto 2023-03-25 um 00 55 03

elliottower commented 1 year ago

Cool to see you got it working!

Alex2782 commented 1 year ago

new example: train_SB3_gymnasium.py

Avg. Rewards, Supported SB3 Agents x learning_timesteps = [5K, 10K, 25K], Comparison with "Random actions"

anytrading_sb3_agents

AminHP commented 1 year ago

Hi guys. Apologize for my delayed participation in this PR. Thanks @Alex2782 for the PR, the changes are neat and concise. The additional examples are also quite useful, the project needed more examples with newer versions of stable-baselines.

Please let me know whenever you have finished. I'm looking forward to reviewing and testing it.

Alex2782 commented 1 year ago

Hi @AminHP,

3 weeks ago I could train successfully with "REINFORCE": https://github.com/Alex2782/gym-anytrading/blob/master/examples/train_REINFORCE.py

"stable release" for SB3 with "gymnasium" support has not been released yet. https://github.com/DLR-RM/stable-baselines3/pull/1327#issuecomment-1508626600

AminHP commented 1 year ago

Hi @Alex2782 , it seems they have just released a new version of stable-baselines3 with Gymnasium support. Can we proceed based on that?

kmafutah commented 1 year ago

Had you seen this

https://github.com/AminHP/gym-anytrading/issues/87 That’s where I got sample code from, not sure is this is the same SB3 you talking about though.

Hope that helps

Kind regards

On Sun, 25 Jun 2023 at 19:31, Elliot Tower @.***> wrote:

For what it’s worth we would love to have some sample code using gymnasium and SB3 for making a tutorial, if you can get it working let me know (I haven’t had time to mess with it myself)

— Reply to this email directly, view it on GitHub https://github.com/AminHP/gym-anytrading/pull/86#issuecomment-1606174015, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7JXSYECC5XR4HGEZEJZKTXNBYV3ANCNFSM6AAAAAAWALMSZA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Kudakwashe Mafutah [image: https://]about.me/kmafutah https://about.me/kmafutah?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext

elliottower commented 1 year ago

Makes sense, I realized they actually updated their entire documentation website so now everything is gymnasium, therefore we can just link to that

On Sun, Jun 25, 2023 at 5:58 PM Kudakwashe Mafutah @.***> wrote:

Had you seen this

https://github.com/AminHP/gym-anytrading/issues/87 That’s where I got sample code from, not sure is this is the same SB3 you talking about though.

Hope that helps

Kind regards

On Sun, 25 Jun 2023 at 19:31, Elliot Tower @.***> wrote:

For what it’s worth we would love to have some sample code using gymnasium and SB3 for making a tutorial, if you can get it working let me know (I haven’t had time to mess with it myself)

— Reply to this email directly, view it on GitHub < https://github.com/AminHP/gym-anytrading/pull/86#issuecomment-1606174015>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA7JXSYECC5XR4HGEZEJZKTXNBYV3ANCNFSM6AAAAAAWALMSZA>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Kudakwashe Mafutah [image: https://]about.me/kmafutah < https://about.me/kmafutah?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext>

— Reply to this email directly, view it on GitHub https://github.com/AminHP/gym-anytrading/pull/86#issuecomment-1606275983, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVPVAZ52WQZWPKYVEWBWF3XNCX7BANCNFSM6AAAAAAWALMSZA . You are receiving this because you commented.Message ID: @.***>

Alex2782 commented 1 year ago

Hi @AminHP,

maybe next week or in 2 weeks I have more time. Should I better create a new PULL request with fewer examples?

Alex2782 commented 1 year ago

Hi @AminHP

I have tried SB3 release version: pip install 'stable-baselines3[extra]' A2C and PPO work so far, after 25K learning steps they reach more rewards than "Random Actions"

train_sb3_gymnasium.py

Bildschirmfoto 2023-07-04 um 22 58 39

AminHP commented 1 year ago

Hi @Alex2782 , great job! Thanks for the update. I will review it soon and we will discuss it.

AminHP commented 1 year ago

I have tested all the examples and everything works fine. Thanks again for your contribution and effort.

About the examples, I agree with you. I also think it is better to use jupyter notebook for the examples instead of python scripts, so the users would comprehend the examples more easily. Besides, we can merge the presented codes in this PR and provide two examples: a2c_quantstats.ipynb (which is the updated version of my code as you already have implemented in SB3_quantstats.ipynb), and reinforce.ipynb (which trains and evaluates multiple RL algorithms). Note that it seems the random_actions example is not necessary as it is available in the README file.

What's your opinion @Alex2782 ?

Alex2782 commented 1 year ago

Hi @AminHP,

i changed to 2d shape and removed _observation_cache.

4 months ago I first trained the changes with REINFORCE example. The example did not work with 2d shape. https://gymnasium.farama.org/tutorials/training_agents/reinforce_invpend_gym_v26/

random_actions example and REINFORCE example I have also removed.

What exactly I did to the "ipynb" files, I unfortunately don't remember, after execution the file was completely overwritten.

Alex2782 commented 1 year ago

@AminHP the render function has been revised https://gymnasium.farama.org/content/migration-guide/#environment-render

I am not sure if more changes are necessary.

In the project 'gym-anytrading' it can still be executed if 'render_mode' is configured correctly.

In the project 'gym-mtsim' it was not compatible at all yet https://github.com/AminHP/gym-mtsim/pull/40

AminHP commented 11 months ago

I have tested the examples and everything works fine. Thanks @Alex2782 for patiently addressing and resolving the matters. Congratulations on your first contribution to this repo!

AminHP commented 11 months ago

I will update the docs and make some minor changes later.

AminHP / gym-anytrading

update from gym to gymnasium #86

'SB3'-predict: A2C or PPO