Closed snafu4 closed 2 years ago
env is equivalent to:
env = MtEnv(
original_simulator=sim,
trading_symbols=['GBPCAD', 'EURUSD', 'USDJPY'],
window_size=10,
# time_points=[desired time points ...],
hold_threshold=0.5,
close_threshold=0.5,
fee=lambda symbol:
...
Using the following right after above
max = 0.01
min = 0.01
step = 0.01
env.original_simulator.symbols_info[symbol].volume_max = max
env.original_simulator.symbols_info[symbol].volume_min = min
env.original_simulator.symbols_info[symbol].volume_step = step
sets the values (verified in code right after) but doesn't actually change the order volume (the volume is all over the place).
Can you expand on '... might work in some cases...'? Why some cases and not others?
For example, this code works. But inside an A2C model, you can't apply it.
sim = MtEnv(...)
env.simulator.symbols_info[symbol1].volume_max = a1
env.simulator.symbols_info[symbol2].volume_max = a2
env.step(...)
env.simulator.symbols_info[symbol1].volume_max = b1
env.simulator.symbols_info[symbol2].volume_max = b2
env.step(...)
env.simulator.symbols_info[symbol1].volume_max = c1
env.simulator.symbols_info[symbol2].volume_max = c2
env.step(...)
I am using AC2.
The _get_modified_volume() in MTEnv class limits the volume to between volume_max and volume_min before the order is sent to MtSimulator for execution.
Can you please explain your statement above "...But inside an A2C model, you can't apply it. ..."? Also, why would the model being used restrict the use of the volume_
Thanks
About the statement "...But inside an A2C ...": I mean this code works because you can change the volume_*
values before calling the step
method of the env. But when you are using an A2C model, it calls the step
method inside of its routines so we cannot change volume_*
values by the code I posted earlier.
About the "why would the model ..." question: This is something related to the MetaTrader. They restrict these values and we cannot set any volume we want.
In case you don't want these restrictions to be applied, just remove or modify the _check_volume
and _get_modified_volume
methods. For example:
class MyMtEnv(MtEnv):
def _check_volume(self, symbol: str, volume: float) -> None:
pass
def _get_modified_volume(self, symbol: str, volume: float) -> float:
si = self.simulator.symbols_info[symbol]
v = abs(volume)
return v
If it is not possible to limit the volume (lot size in forex), how else can the risk be mitigated within this framework?
Shouldn't the RL model itself learn to manage the risk? I think risk management is part of the prediction algorithm. I mean when we give an action to a GymEnv, it should only check if this action is according to the environmental constraints and then apply it. If the action conflicts with the environmental restrictions, the GymEnv can either ignore or modify it, which the latter was selected in MtEnv. Anything beyond that like risk management and per-order volume restrictions should be applied before passing the actions to the step method.
Since stable-baselines does not support such a thing, a simple way to do it is to call a function at the beginning of the step method:
class MyMtEnv(MtEnv):
def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
action = self._modify_action(action)
return super().step(action)
def _modify_action(self, action: np.ndarray) -> np.ndarray:
k = self.symbol_max_orders + 2
for i, symbol in enumerate(self.trading_symbols):
symbol_action = action[k*i:k*(i+1)]
volume = symbol_action[-1]
if self._current_tick > 20: # or some other conditions according to your risk management algorithm
volume = np.clip(volume, -1.2, 1.2)
symbol_action[-1] = volume
return action
I will give the code a try. Thanks.
You bring up a good point w.r.t. the RL model learning to manage the risk (that's what a proper reward is supposed to control, right??). However, I trade with a proprietary firm that restricts certain aspects of my account during a trading session (for example, if my balance goes more than x% below it's daily starting amount, all my trades will be closed automatically).
This means that I have to watch my intra-day drawdown. One of the ways I control this is through a restriction of the volume/lot-size.
This is why the control I've asked about in this thread is critical for me.
Yes, an ideal RL agent should be able to control everything, but it is difficult to make such an agent. By the way, I like your attitude. Managing some stuff besides the RL model is an excellent way to combine the knowledge of a human expert and an AI agent. It helps to achieve better results in less time.
Let me know if my last piece of code was proper for your requirement.
I will try and debug asap but maybe you can quickly see the problem:
class MyCustomEnv(MtEnv):
def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
action = self._modify_action(action)
super().step(action)
def _modify_action(self, action: np.ndarray) -> np.ndarray:
k = self.symbol_max_orders + 2
for i, symbol in enumerate(self.trading_symbols):
symbol_action = action[k*i:k*(i+1)]
volume = symbol_action[-1]
if self._current_tick > 0: # or some other conditions according to your risk management algorithm
volume = min(volume, 0.01)
symbol_action[-1] = volume
I get errors below when:
env = MyCustomEnv(original_simulator = sim,
trading_symbols = ['EURUSD', 'GBPJPY'],
window_size = 10,
symbol_max_orders = 2,
multiprocessing_processes = 4,
)
is run.
Process SpawnPoolWorker-16:
Traceback (most recent call last):
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 315, in _bootstrap
self.run()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\pool.py", line 114, in worker
task = get()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\queues.py", line 361, in get
return _ForkingPickler.loads(res)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 327, in loads
return load(file, ignore, **kwds)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 313, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 525, in load
obj = StockUnpickler.load(self)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 515, in find_class
return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
It seems there is a problem with pathos and multiprocessing. Please try multiprocessing_processes=None
for now until I fix it.
FYI: multiprocessing_processes=None
results in
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<timed exec> in <module>
~\AppData\Local\Temp/ipykernel_21208/3706694186.py in run_Model(i, total_timesteps, fileName, saveModel)
6 model = A2C('MultiInputPolicy', env, verbose=0)
7 # model.learn(total_timesteps=total_timesteps, callback=[eval_callback])
----> 8 model.learn(total_timesteps=total_timesteps)
9
10 observation = env.reset()
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\a2c\a2c.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
190 ) -> "A2C":
191
--> 192 return super(A2C, self).learn(
193 total_timesteps=total_timesteps,
194 callback=callback,
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
235 while self.num_timesteps < total_timesteps:
236
--> 237 continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
238
239 if continue_training is False:
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
176 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
177
--> 178 new_obs, rewards, dones, infos = env.step(clipped_actions)
179
180 self.num_timesteps += env.num_envs
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py in step(self, actions)
160 """
161 self.step_async(actions)
--> 162 return self.step_wait()
163
164 def get_images(self) -> Sequence[np.ndarray]:
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py in step_wait(self)
41 def step_wait(self) -> VecEnvStepReturn:
42 for env_idx in range(self.num_envs):
---> 43 obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
44 self.actions[env_idx]
45 )
d:\python\pyenv\gym-mtsim\lib\site-packages\stable_baselines3\common\monitor.py in step(self, action)
88 if self.needs_reset:
89 raise RuntimeError("Tried to step environment that needs reset")
---> 90 observation, reward, done, info = self.env.step(action)
91 self.rewards.append(reward)
92 if done:
~\AppData\Local\Temp/ipykernel_21208/4242459879.py in step(self, action)
71 def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
72 action = self._modify_action(action)
---> 73 super().step(action)
74
75 def _modify_action(self, action: np.ndarray) -> np.ndarray:
D:\Python\pyenv\gym-mtsim\gym_mtsim\envs\mt_env.py in step(self, action)
108
109 def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
--> 110 orders_info, closed_orders_info = self._apply_action(action)
111
112 self._current_tick += 1
D:\Python\pyenv\gym-mtsim\gym_mtsim\envs\mt_env.py in _apply_action(self, action)
135
136 for i, symbol in enumerate(self.trading_symbols):
--> 137 symbol_action = action[k*i:k*(i+1)]
138 close_orders_logit = symbol_action[:-2]
139 hold_logit = symbol_action[-2]
TypeError: 'NoneType' object is not subscriptable
The new code works (thanks) but it does not appear to accomplish the original goal. The volume is not affected by its inclusion.
class MyMtEnv(MtEnv):
def step(self, action: np.ndarray) -> Tuple[Dict[str, np.ndarray], float, bool, Dict[str, Any]]:
action = self._modify_action(action)
super().step(action)
def _modify_action(self, action: np.ndarray) -> np.ndarray:
k = self.symbol_max_orders + 2
for i, symbol in enumerate(self.trading_symbols):
symbol_action = action[k*i:k*(i+1)]
volume = symbol_action[-1]
# if self._current_tick > 20: # or some other conditions according to your risk management algorithm
volume = min(volume, 0.01)
symbol_action[-1] = volume
return action
The volume should be fixed at 0.01.
My code had a problem and I fixed it. Try the new code and send me your complete code if the problem still exists. Make sure you are using MyMtEnv
not MtEnv
.
You caught my mistake ('Make sure you are using MyMtEnv not MtEnv'). I wasn't actually testing your updated code.
New code appears work for restricting volume!! ... but only if
multiprocessing_processes = None
When set to 1, 4 get:
AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 315, in _bootstrap
self.run()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\pool.py", line 114, in worker
task = get()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\multiprocess\queues.py", line 361, in get
return _ForkingPickler.loads(res)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 327, in loads
return load(file, ignore, **kwds)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 313, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 525, in load
obj = StockUnpickler.load(self)
File "d:\python\pyenv\gym-mtsim\lib\site-packages\dill\_dill.py", line 515, in find_class
return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'MyCustomEnv' on <module '__main__' (built-in)>
As I said earlier, it seems to be a problem with pathos
. I don't exactly know the problem but it can be fixed using the code below.
class MyCustomEnv(MtEnv):
def __init__(self, *args, **kwargs):
multiprocessing_processes = kwargs.pop('multiprocessing_processes', None)
super().__init__(*args, **kwargs)
self.multiprocessing_pool = Pool(multiprocessing_processes) if multiprocessing_processes else None
def step(self, action: np.ndarray):
action = self._modify_action(action)
return super().step(action)
def _modify_action(self, action: np.ndarray) -> np.ndarray:
k = self.symbol_max_orders + 2
for i, symbol in enumerate(self.trading_symbols):
symbol_action = action[k*i:k*(i+1)]
volume = symbol_action[-1]
volume = np.clip(volume, -0.01, 0.01)
symbol_action[-1] = volume
return action
Specifically, how do I set 'volume', 'volume_step', 'volume_min' and 'volume_max' without creating a child of the Order class? I need to be able to set these values potentially for each order.