understanding action_space

AminHP / gym-mtsim

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

MIT License

412 stars 101 forks source link

understanding action_space #10

Closed sadimoodi closed 2 years ago

sadimoodi commented 2 years ago

Hello @AminHP , I am struggeling to understand your action space, you say quote: " The action space is a 1D vector of size count(trading_symbols) * (symbol_max_orders + 2). For each symbol, two types of actions can be performed, closing previous orders and placing a new order. The latter is controlled by the first symbol_max_orders elements and the former is controlled by the last two elements."

you also write:

self.action_space = spaces.Box(
            low=-np.inf, high=np.inf,
            shape=(len(self.trading_symbols) * (self.symbol_max_orders + 2),)
        )  #

why do you +2 to the symbole_max_orders? what are the last 2 elements you are referring to?

AminHP commented 2 years ago

According to the rest of the quote: "[probability of closing order 1, probability of closing order 2, ..., probability of closing order symbol_max_orders, probability of holding, volume of new order]. The last two elements specify whether to hold or place a new order and the volume of the new order (positive volume indicates buy and negative volume indicates sell)"

The last two elements are probability of holding and volume of new order. Let's say symbol_max_orders is 2. Then we have 2+2=4 values for each symbol. The first value is the probability of closing the first order in the orders list of the simulator. The second value is the probability of closing the second order in that list. The third value is the probability of holding or creating a new order. The fourth value is the volume of the new order (it is ignored if the third value is higher than a threshold).

sadimoodi commented 2 years ago

@AminHP thanks for the explanation, the example made it clearer, please update the sentence in the home page to: "[probability of closing order 1, probability of closing order 2, ..., probability of closing order symbol_max_orders, probability of holding OR CREATING A NEW ORDER, volume of new order].

you missed the statement in bold and capital.

sadimoodi commented 2 years ago

This takes me to the next question: what is inside the action array here:

 def _apply_action(self, action: np.ndarray) -> Tuple[Dict, Dict]:
        orders_info = {}
        closed_orders_info = {symbol: [] for symbol in self.trading_symbols}

        k = self.symbol_max_orders + 2

        for i, symbol in enumerate(self.trading_symbols):
            symbol_action = action[k*i:k*(i+1)]
            close_orders_logit = symbol_action[:-2]
            hold_logit = symbol_action[-2]
            volume = symbol_action[-1]

how do you represent actions to be taken inside the action[ki:k(i+1)] ?

AminHP commented 2 years ago

The action array has (symbol_max_orders + 2) * len(trading_symbols) elements. The first symbol_max_orders + 2 elements represent action space for the first symbol, the second symbol_max_orders + 2 elements represent action space for the second symbol, and so on. Therefore, action[ki:k(i+1)] represents actions of symbol i.