AminHP / gym-mtsim

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)
MIT License
412 stars 101 forks source link

Question: Is there an 'easy' way to set Order properties? #4

Closed snafu4 closed 2 years ago

snafu4 commented 2 years ago

Specifically, how do I set 'volume', 'volume_step', 'volume_min' and 'volume_max' without creating a child of the Order class? I need to be able to set these values potentially for each order.

AminHP commented 2 years ago

These values (volume_step, volume_min, volume_max) are only used in here and here.

I don't exactly know how you are working with the simulator or the env. But changing the values stored in the symbols_info attribute before creating a new order might work in some cases.

snafu4 commented 2 years ago

env is equivalent to:

env = MtEnv(
    original_simulator=sim,
    trading_symbols=['GBPCAD', 'EURUSD', 'USDJPY'],
    window_size=10,
    # time_points=[desired time points ...],
    hold_threshold=0.5,
    close_threshold=0.5,
    fee=lambda symbol: 
...

Using the following right after above

max = 0.01
min = 0.01
step = 0.01
env.original_simulator.symbols_info[symbol].volume_max = max
env.original_simulator.symbols_info[symbol].volume_min = min
env.original_simulator.symbols_info[symbol].volume_step = step

sets the values (verified in code right after) but doesn't actually change the order volume (the volume is all over the place).

Can you expand on '... might work in some cases...'? Why some cases and not others?

AminHP commented 2 years ago

For example, this code works. But inside an A2C model, you can't apply it.

sim = MtEnv(...)

env.simulator.symbols_info[symbol1].volume_max = a1
env.simulator.symbols_info[symbol2].volume_max = a2
env.step(...)

env.simulator.symbols_info[symbol1].volume_max = b1
env.simulator.symbols_info[symbol2].volume_max = b2
env.step(...)

env.simulator.symbols_info[symbol1].volume_max = c1
env.simulator.symbols_info[symbol2].volume_max = c2
env.step(...)
snafu4 commented 2 years ago

I am using AC2.

The _get_modified_volume() in MTEnv class limits the volume to between volume_max and volume_min before the order is sent to MtSimulator for execution.

Can you please explain your statement above "...But inside an A2C model, you can't apply it. ..."? Also, why would the model being used restrict the use of the volume_ values?

Thanks

AminHP commented 2 years ago

About the statement "...But inside an A2C ...": I mean this code works because you can change the volume_* values before calling the step method of the env. But when you are using an A2C model, it calls the step method inside of its routines so we cannot change volume_* values by the code I posted earlier.

About the "why would the model ..." question: This is something related to the MetaTrader. They restrict these values and we cannot set any volume we want.

In case you don't want these restrictions to be applied, just remove or modify the _check_volume and _get_modified_volume methods. For example:

class MyMtEnv(MtEnv):
    def _check_volume(self, symbol: str, volume: float) -> None:
        pass

    def _get_modified_volume(self, symbol: str, volume: float) -> float:
        si = self.simulator.symbols_info[symbol]
        v = abs(volume)
        return v
snafu4 commented 2 years ago

If it is not possible to limit the volume (lot size in forex), how else can the risk be mitigated within this framework?

AminHP commented 2 years ago

Shouldn't the RL model itself learn to manage the risk? I think risk management is part of the prediction algorithm. I mean when we give an action to a GymEnv, it should only check if this action is according to the environmental constraints and then apply it. If the action conflicts with the environmental restrictions, the GymEnv can either ignore or modify it, which the latter was selected in MtEnv. Anything beyond that like risk management and per-order volume restrictions should be applied before passing the actions to the step method.

Since stable-baselines does not support such a thing, a simple way to do it is to call a function at the beginning of the step method:

class MyMtEnv(MtEnv):
    def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
        action = self._modify_action(action)
        return super().step(action)

    def _modify_action(self, action: np.ndarray) -> np.ndarray:
        k = self.symbol_max_orders + 2
        for i, symbol in enumerate(self.trading_symbols):
            symbol_action = action[k*i:k*(i+1)]
            volume = symbol_action[-1]
            if self._current_tick > 20:  # or some other conditions according to your risk management algorithm
                volume = np.clip(volume, -1.2, 1.2)
            symbol_action[-1] = volume
        return action
snafu4 commented 2 years ago

I will give the code a try. Thanks.

You bring up a good point w.r.t. the RL model learning to manage the risk (that's what a proper reward is supposed to control, right??). However, I trade with a proprietary firm that restricts certain aspects of my account during a trading session (for example, if my balance goes more than x% below it's daily starting amount, all my trades will be closed automatically).

This means that I have to watch my intra-day drawdown. One of the ways I control this is through a restriction of the volume/lot-size.

This is why the control I've asked about in this thread is critical for me.

AminHP commented 2 years ago

Yes, an ideal RL agent should be able to control everything, but it is difficult to make such an agent. By the way, I like your attitude. Managing some stuff besides the RL model is an excellent way to combine the knowledge of a human expert and an AI agent. It helps to achieve better results in less time.

Let me know if my last piece of code was proper for your requirement.

snafu4 commented 2 years ago

I will try and debug asap but maybe you can quickly see the problem:

class MyCustomEnv(MtEnv):

    def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
        action = self._modify_action(action)
        super().step(action)

    def _modify_action(self, action: np.ndarray) -> np.ndarray:
        k = self.symbol_max_orders + 2
        for i, symbol in enumerate(self.trading_symbols):
            symbol_action = action[k*i:k*(i+1)]
            volume = symbol_action[-1]
            if self._current_tick > 0:  # or some other conditions according to your risk management algorithm
                volume = min(volume, 0.01)
            symbol_action[-1] = volume

I get errors below when:

env = MyCustomEnv(original_simulator = sim,
                 trading_symbols = ['EURUSD', 'GBPJPY'],
                 window_size = 10,
                  symbol_max_orders = 2,
                  multiprocessing_processes = 4,
                 )

is run.

Process SpawnPoolWorker-16:
Traceback (most recent call last):
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 315, in _bootstrap
    self.run()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\pool.py", line 114, in worker
    task = get()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\queues.py", line 361, in get
    return _ForkingPickler.loads(res)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 327, in loads
    return load(file, ignore, **kwds)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 313, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 525, in load
    obj = StockUnpickler.load(self)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 515, in find_class
    return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
AminHP commented 2 years ago

It seems there is a problem with pathos and multiprocessing. Please try multiprocessing_processes=None for now until I fix it.

snafu4 commented 2 years ago

FYI: multiprocessing_processes=None

results in


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

~\AppData\Local\Temp/ipykernel_21208/3706694186.py in run_Model(i, total_timesteps, fileName, saveModel)
      6     model = A2C('MultiInputPolicy', env, verbose=0)
      7 #     model.learn(total_timesteps=total_timesteps, callback=[eval_callback])
----> 8     model.learn(total_timesteps=total_timesteps)
      9 
     10     observation = env.reset()

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\a2c\a2c.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    190     ) -> "A2C":
    191 
--> 192         return super(A2C, self).learn(
    193             total_timesteps=total_timesteps,
    194             callback=callback,

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    235         while self.num_timesteps < total_timesteps:
    236 
--> 237             continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
    238 
    239             if continue_training is False:

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
    176                 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
    177 
--> 178             new_obs, rewards, dones, infos = env.step(clipped_actions)
    179 
    180             self.num_timesteps += env.num_envs

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py in step(self, actions)
    160         """
    161         self.step_async(actions)
--> 162         return self.step_wait()
    163 
    164     def get_images(self) -> Sequence[np.ndarray]:

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py in step_wait(self)
     41     def step_wait(self) -> VecEnvStepReturn:
     42         for env_idx in range(self.num_envs):
---> 43             obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
     44                 self.actions[env_idx]
     45             )

d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\monitor.py in step(self, action)
     88         if self.needs_reset:
     89             raise RuntimeError("Tried to step environment that needs reset")
---> 90         observation, reward, done, info = self.env.step(action)
     91         self.rewards.append(reward)
     92         if done:

~\AppData\Local\Temp/ipykernel_21208/4242459879.py in step(self, action)
     71     def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
     72         action = self._modify_action(action)
---> 73         super().step(action)
     74 
     75     def _modify_action(self, action: np.ndarray) -> np.ndarray:

D:\Python\pyenv\gym-mtsim\gym_mtsim\envs\mt_env.py in step(self, action)
    108 
    109     def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
--> 110         orders_info, closed_orders_info = self._apply_action(action)
    111 
    112         self._current_tick += 1

D:\Python\pyenv\gym-mtsim\gym_mtsim\envs\mt_env.py in _apply_action(self, action)
    135 
    136         for i, symbol in enumerate(self.trading_symbols):
--> 137             symbol_action = action[k*i:k*(i+1)]
    138             close_orders_logit = symbol_action[:-2]
    139             hold_logit = symbol_action[-2]

TypeError: 'NoneType' object is not subscriptable
AminHP commented 2 years ago

I updated this code

snafu4 commented 2 years ago

The new code works (thanks) but it does not appear to accomplish the original goal. The volume is not affected by its inclusion.

class MyMtEnv(MtEnv):
    def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
        action = self._modify_action(action)
        super().step(action)

    def _modify_action(self, action: np.ndarray) -> np.ndarray:
        k = self.symbol_max_orders + 2
        for i, symbol in enumerate(self.trading_symbols):
            symbol_action = action[k*i:k*(i+1)]
            volume = symbol_action[-1]
#             if self._current_tick > 20:  # or some other conditions according to your risk management algorithm
            volume = min(volume, 0.01)
            symbol_action[-1] = volume
        return action

The volume should be fixed at 0.01.

image

AminHP commented 2 years ago

My code had a problem and I fixed it. Try the new code and send me your complete code if the problem still exists. Make sure you are using MyMtEnv not MtEnv.

snafu4 commented 2 years ago

You caught my mistake ('Make sure you are using MyMtEnv not MtEnv'). I wasn't actually testing your updated code.

New code appears work for restricting volume!! ... but only if

multiprocessing_processes = None

When set to 1, 4 get:

AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
Process SpawnPoolWorker-4:
Traceback (most recent call last):
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 315, in _bootstrap
    self.run()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\pool.py", line 114, in worker
    task = get()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\queues.py", line 361, in get
    return _ForkingPickler.loads(res)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 327, in loads
    return load(file, ignore, **kwds)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 313, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 525, in load
    obj = StockUnpickler.load(self)
  File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 515, in find_class
    return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
AminHP commented 2 years ago

As I said earlier, it seems to be a problem with pathos. I don't exactly know the problem but it can be fixed using the code below.


class MyCustomEnv(MtEnv):
    def __init__(self, *args, **kwargs):
        multiprocessing_processes = kwargs.pop('multiprocessing_processes', None)
        super().__init__(*args, **kwargs)
        self.multiprocessing_pool = Pool(multiprocessing_processes) if multiprocessing_processes else None

    def step(self, action: np.ndarray):
        action = self._modify_action(action)
        return super().step(action)

    def _modify_action(self, action: np.ndarray) -> np.ndarray:
        k = self.symbol_max_orders + 2
        for i, symbol in enumerate(self.trading_symbols):
            symbol_action = action[k*i:k*(i+1)]
            volume = symbol_action[-1]
            volume = np.clip(volume, -0.01, 0.01)
            symbol_action[-1] = volume
        return action