AminHP / gym-mtsim

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)
MIT License
428 stars 107 forks source link

Composite Spaces #46

Open AtagyanAG opened 7 months ago

AtagyanAG commented 7 months ago

Thanks to the author for the excellent repository. Helped me open up a new direction to explore. I am interested in the issue of creating an environment. In practice, many traders use multiple time frames to decide whether to open a position. For example, I want to analyze the last 200 bars of the H4 time frame and the last 100 bars from the M5 time frame, and I also want to separately transfer bid and ask data to the environment. I wrote several indicators in MQL5 that upload all the necessary data into a sqlite database and it is convenient to use this data to create sequences that form the environment. The stable-baselines3 documentation states that it is possible to create a composite environment. Something like this: self.observation_space = spaces.Dict({ 'SlowTF': spaces.Box(low=0, high=1, shape=cfg.slowTF_shape, dtype=np.float32), 'FastTF': spaces.Box(low=0, high=1, shape=cfg.fastTF_shape, dtype=np.float32), 'Tick': spaces.Box(low=0, high=1, shape=cfg.tick_shape, dtype=np.float32) }) How can this approach be implemented in gym-mtsim or perhaps in gym-anytrading? The number of parameters in gym-mtsim, in my opinion, looks unnecessary. unit, balance, equity, margin, leverage, etc. only complicate the learning process and can be implemented directly in the metatrader5 terminal itself using MQL5. If the model is trained only to search for entry and exit points of a position, this will already be enough for its use. MQL5 has built-in support for the ONNX format. The trained model can be exported to ONNX format and used directly in the code of an advisor or indicator as an imported function.

AminHP commented 6 months ago

Hi @AtagyanAG ,

I was thinking about the multiple timeframes. I had this idea that maybe giving the M1 timeframe data to the env would be enough for the agent. I mean human experts analyze multiple timeframes like H4 and M5 because it is difficult for them to watch everything in full detail, however, an RL agent can be able to analyze the M1 timeframe and extract patterns for other timeframes as well.

The other parameters like balance and equity are sort of effective in similar situations with slightly different parameters. For instance, when the agent sees a high-risk-high-reward signal, it can react with different actions based on the amount of balance/equity and its risk capacity. However, you can easily remove the extra parameters from the observation space for your application by doing some small changes in the env.