AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.1k stars 465 forks source link

Have you tried using multiple cpu on the Example here in A2C? #25

Closed toksis closed 3 years ago

toksis commented 3 years ago

I am trying to use multiple cpu for the example provided on this link?

I tried to change the environment to multiple cpu.

env = DummyVecEnv([env_maker for i in range(16)])

But I have a problem in the done and info in stable baselines. It seems they turned into arrays.

There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.

#env = env_maker()
#observation = env.reset()

while True:
    #observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done:
        print("info:", info)



ValueError                                Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
     11     # env.render()
---> 12     if done:
     13         print("info:", info)
     14         break

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
AminHP commented 3 years ago

Use done.all() in the if statement.

toksis commented 3 years ago

It work but the render_all would not work even using the env.env_method(method_name='render_all') to call the method.

AminHP commented 3 years ago

Use this code instead:

for e in env.envs:
    plt.figure(figsize=(16, 6))
AminHP commented 3 years ago

There is a fact you should consider: DummyVecEnv resets all the environments after they are done. So, you can use the code below after importing DummyVecEnv to prevent this problem.

from copy import deepcopy

def step_wait(self):
    for env_idx in range(self.num_envs):
        obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =\
        if self.buf_dones[env_idx]:
            # save final observation where user can get it, then reset
            self.buf_infos[env_idx]['terminal_observation'] = obs
            # obs = self.envs[env_idx].reset()
        self._save_obs(env_idx, obs)
    return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),

DummyVecEnv.step_wait = step_wait
toksis commented 3 years ago


Removing the .reset in Dummyvec end results this error... This happends at timesteps = 32000

   current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335

I think that is the length of the data frame.


 File "e:\ml\reinforcementlearning\tradeorig\stable-baselines\stable_baselines\common\vec_env\", line 150, in step
    return self.step_wait()
  File "e:\ML\reinforcementlearning\tradeorig\", line 29, in step_wait
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\", line 78, in step
    step_reward = self._calculate_reward(action)
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\", line 39, in _calculate_reward
    current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335
AminHP commented 3 years ago
from copy import deepcopy
import numpy as np
import pandas as pd

import gym
import gym_anytrading
import quantstats as qs

from stable_baselines import A2C
from stable_baselines.common.vec_env import DummyVecEnv

import matplotlib.pyplot as plt

df = gym_anytrading.datasets.STOCKS_GOOGL.copy()

window_size = 10
start_index = window_size
end_index = len(df)

env_maker = lambda: gym.make(
    df = df,
    window_size = window_size,
    frame_bound = (start_index, end_index)

env = DummyVecEnv([env_maker for _ in range(16)])

policy_kwargs = dict(net_arch=[64, 'lstm', dict(vf=[128, 128, 128], pi=[64, 64])])
model = A2C('MlpLstmPolicy', env, verbose=1, policy_kwargs=policy_kwargs)

class DummyVecEnv2(DummyVecEnv):
    def step_wait(self):
        for env_idx in range(self.num_envs):
            obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =            self.envs[env_idx].step(self.actions[env_idx])
            if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]['terminal_observation'] = obs
                # obs = self.envs[env_idx].reset()
            self._save_obs(env_idx, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),

env = DummyVecEnv2([env_maker for i in range(16)])
observation = env.reset()

while True:
    # observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done.all():
        print("info:", info)

for e in env.envs:
    plt.figure(figsize=(16, 6))
toksis commented 3 years ago

You are a Guru! It works now. What you did was after learning, override the DummyvecEnv by removing the reset. Am i correct?

AminHP commented 3 years ago

Thanks man :)

Yeah, somehow, but I didn't override DummyVecEnv itself this time. I inherited a new class from it (DummyVecEnv2) and overrode its reset method.