Closed GXY2017 closed 2 years ago
Hi @GXY2017,
I used self._done
in this line of code in order to calculate the profit at the terminal point. Perhaps it can be used for reward calculation. Please check out the below code and let me know if it works for you (I didn't test it):
def _calculate_reward(self, action):
step_reward = 0 # pip
trade = False
if ((action == Actions.Buy.value and self._position == Positions.Short) or
(action == Actions.Sell.value and self._position == Positions.Long)):
trade = True
if trade or self._done:
current_price = self.prices[self._current_tick]
last_trade_price = self.prices[self._last_trade_tick]
price_diff = current_price - last_trade_price
if self._position == Positions.Short:
step_reward += -price_diff * 10000
elif self._position == Positions.Long:
step_reward += price_diff * 10000
return step_reward
It works. Thanks.
The question is about env._calculate_reward(self, action), I think there should be some way to handle the end of each training.
Assume we choose 5000 HLOC bars every time in training. The agent sells at bar[-5] and there is no buy action to calculate the ending reward. Then, we have to assign a reward to it manually. This will inevitably alter the training result.
At the moment, I set the ending reward to 0.
But setting it to 0 still gives the wrong info to the agent. The different values assigned can cause huge swings in the final result. To avoid this problem, is there any way we can set the transaction numbers instead of the length of bars? Thank you.
Update I found a solution, promising to work with but still suffer from information loss. I add this to trading_env.
and altered this following part respectively.
The major drawback of this method is I cannot apply it in the validation process by dropping the last several bars. And of course, I will need the last bars in real trading. Is there any more efficient and accurate solution?