Strategy Gen 6: Broker Stat

JaCoderX commented 4 years ago

@Kismuz,

I'm currently exploring the code that generate the broker datalines. My aim is to expend the stats to include a basic measurement for short trading.

When testing the output of the borker stats I came across a few problems that made me think that maybe this part of the code needs a rework:

realized and unrealized values are never non zero together (should both be non zero on partial sell of position)
decrease on exposure (to non zero value) doesn't show on realized pnl (trade_result is 0)
unrealized current calculation is a misleading and confusing. pnl = (current_value - self.realized_broker_value) * normalizer but... self.realized_broker_value = self.broker.get_value() which I think actually mean value_before - value_after. it should be something like: unrealize = value - cash - realize

Kismuz commented 4 years ago

@JacobHanouna, you are possibly right; would you pull some code suggestions?

JaCoderX commented 4 years ago

From backtrader trade.py:

class Trade(object):
    Keeps track of the life of an trade: size, price,
    commission (and value?)

    An trade starts at 0 can be increased and reduced and can
    be considered closed if it goes back to 0.

    The trade can be long (positive size) or short (negative size)

From BTGym base.py:

def notify_trade(self, trade):
    if trade.isclosed:
...

The use of the attribute isclosed as a condition to indicate an active trade is incorrect. but even replacing trade.isclosed won't solve this issue, because when printing the trade object I notice that trade.pnl is non-zero only when size is zero.

So condition need to be fix but more investigation is needed to understand why the trade object is missing information

JaCoderX commented 4 years ago

@Kismuz , I finally got to investigate this issue a bit more in depth. according to backtrader, notify_trade() is called only on 2 conditions:

trade start - indicated by position changing from 0 to x
trade end - indicated by position changing from x to 0

So any order that change the position from non-zero size to a different non-zero size won't trigger notify_trade().

The problem is that both get_broker_realized_pnl and get_broker_unrealized_pnl base their calculations on notify_trade().

I'll keep investigating it to find a solution

JaCoderX commented 4 years ago

I think I got to the bottom of this issue. The trade object can in fact be regarded as an aggregation of executed orders, but we only gain access to it after the first executed order and on the last one. This means that every executed order that is in between is ignored by profit and loss calculations. So instead of having PnL calculations after every order we make, we actually have the PnL of multiple orders aggregated under one trade.

From the reward perspective this mean that we are working on a sparse reward setup and instead of learning action-to-reward relation, we force the system to learn how a trajectory of actions is rewarded.

Going over backtrader code, I found that it possible to retrieve information on individual orders under the trade.history attribute. I'll post a pull request once it is resolved

Kismuz commented 4 years ago

From the reward perspective this mean that we are working on a sparse reward setup and instead of learning action-to-reward relation, we force the system to learn how a trajectory of actions is rewarded.

yup. that's why reward function consists of several terms: main reward term (realized pnl) relies on notify_trade to get 'spike' reward when position crosses zero, and the other term is 'potential function' based on current unrealized pnl which we retrieve from broker at every step (see. A.Ng papers on reward shaping, cited in get_reward() docstring) - it provides smaller 'right direction' signal

JaCoderX commented 4 years ago

main reward term (realized pnl) relies on notify_trade to get 'spike' reward when position crosses zero

This is where we can get improvement, instead of waiting for the position to cross zero to get the 'spike' (can be several cycles of rewards). we need to get the realized pnl on any order that reverse the trade direction, this way we get the right reward on the next cycle with no delays.

Kismuz commented 4 years ago

do I understand it correctly: your point is to estimate real.pnl resulting from partial reversion of trade direction, e.g. if we got 3 positions long and agent issues 1 short we consider it realized pnl resulting from reducing long position by 1?

JaCoderX commented 4 years ago

correct.

I'm still working on the code but this is the general idea in BaseStrategy6:

    def notify_trade(self, trade):
        if trade.justopened:
            self.trade = trade

    def update_broker_stat(self):
        self.update_sub_trade()
        ...

    def get_broker_realized_pnl(self, normalizer, **kwargs):
        return self.sub_trade['pnlcomm'] * normalizer

    def update_sub_trade(self):

        self.sub_trade['pnl'] = 0.0
        self.sub_trade['pnlcomm'] = 0.0

        if self.trade is None:
            return

        current_sub_trade_count = len(self.trade.history)

        # reset - trade object had changed
        if current_sub_trade_count < self.last_sub_trade_count:
            self.last_sub_trade_count = 0

        # update sub trade on every new entry in trade.history
        # data from trade.history get aggregated as trade evolve but we need the unaggregated terms
        if current_sub_trade_count > self.last_sub_trade_count and current_sub_trade_count > 1:
            self.sub_trade['pnl'] = self.trade.history[-1].status.pnl - self.trade.history[-2].status.pnl
            self.sub_trade['pnlcomm'] = self.trade.history[-1].status.pnlcomm - self.trade.history[-2].status.pnlcomm

        # keep count on the number of sub trades
        self.last_sub_trade_count = len(self.trade.history)

JaCoderX commented 4 years ago

@Kismuz,

I finished testing this new concept on the sin data. It seem to perform just as good as current logic for this simple example.

I can make a pull request on gen 6 but it would change a few of the current functionalities. for example, reward use pos_duration stat which is redundant in this new logic (reward get calculated fully on each skip_frame). This might break functionality on dev made on top of gen 6. plus I didn't maintained unused functionalities, e.g. get_broker_max_unrealized_pnl or I could create a new version? gen 7?

Kismuz commented 4 years ago

or I could create a new version? gen 7? yes, it is logical to separate it as research evolution.

Kismuz / btgym

Strategy Gen 6: Broker Stat #117