Closed JaCoderX closed 4 years ago
@JacobHanouna, you are possibly right; would you pull some code suggestions?
From backtrader trade.py
:
class Trade(object):
Keeps track of the life of an trade: size, price,
commission (and value?)
An trade starts at 0 can be increased and reduced and can
be considered closed if it goes back to 0.
The trade can be long (positive size) or short (negative size)
From BTGym base.py
:
def notify_trade(self, trade):
if trade.isclosed:
...
The use of the attribute isclosed
as a condition to indicate an active trade is incorrect.
but even replacing trade.isclosed
won't solve this issue,
because when printing the trade
object I notice that trade.pnl is non-zero only when size is zero.
So condition need to be fix but more investigation is needed to understand why the trade
object is missing information
@Kismuz , I finally got to investigate this issue a bit more in depth.
according to backtrader, notify_trade()
is called only on 2 conditions:
So any order that change the position from non-zero size to a different non-zero size won't trigger notify_trade()
.
The problem is that both get_broker_realized_pnl
and get_broker_unrealized_pnl
base their calculations on notify_trade()
.
I'll keep investigating it to find a solution
I think I got to the bottom of this issue. The trade object can in fact be regarded as an aggregation of executed orders, but we only gain access to it after the first executed order and on the last one. This means that every executed order that is in between is ignored by profit and loss calculations. So instead of having PnL calculations after every order we make, we actually have the PnL of multiple orders aggregated under one trade.
From the reward perspective this mean that we are working on a sparse reward setup and instead of learning action-to-reward relation, we force the system to learn how a trajectory of actions is rewarded.
Going over backtrader code, I found that it possible to retrieve information on individual orders under the trade.history attribute. I'll post a pull request once it is resolved
From the reward perspective this mean that we are working on a sparse reward setup and instead of learning action-to-reward relation, we force the system to learn how a trajectory of actions is rewarded.
yup. that's why reward function consists of several terms: main reward term (realized pnl) relies on notify_trade
to get 'spike' reward when position crosses zero, and the other term is 'potential function' based on current unrealized pnl which we retrieve from broker at every step (see. A.Ng papers on reward shaping, cited in get_reward() docstring) - it provides smaller 'right direction' signal
main reward term (realized pnl) relies on notify_trade to get 'spike' reward when position crosses zero
This is where we can get improvement, instead of waiting for the position to cross zero to get the 'spike' (can be several cycles of rewards). we need to get the realized pnl on any order that reverse the trade direction, this way we get the right reward on the next cycle with no delays.
do I understand it correctly: your point is to estimate real.pnl resulting from partial reversion of trade direction, e.g. if we got 3 positions long and agent issues 1 short we consider it realized pnl resulting from reducing long position by 1?
correct.
I'm still working on the code but this is the general idea
in BaseStrategy6
:
def notify_trade(self, trade):
if trade.justopened:
self.trade = trade
def update_broker_stat(self):
self.update_sub_trade()
...
def get_broker_realized_pnl(self, normalizer, **kwargs):
return self.sub_trade['pnlcomm'] * normalizer
def update_sub_trade(self):
self.sub_trade['pnl'] = 0.0
self.sub_trade['pnlcomm'] = 0.0
if self.trade is None:
return
current_sub_trade_count = len(self.trade.history)
# reset - trade object had changed
if current_sub_trade_count < self.last_sub_trade_count:
self.last_sub_trade_count = 0
# update sub trade on every new entry in trade.history
# data from trade.history get aggregated as trade evolve but we need the unaggregated terms
if current_sub_trade_count > self.last_sub_trade_count and current_sub_trade_count > 1:
self.sub_trade['pnl'] = self.trade.history[-1].status.pnl - self.trade.history[-2].status.pnl
self.sub_trade['pnlcomm'] = self.trade.history[-1].status.pnlcomm - self.trade.history[-2].status.pnlcomm
# keep count on the number of sub trades
self.last_sub_trade_count = len(self.trade.history)
@Kismuz,
I finished testing this new concept on the sin data. It seem to perform just as good as current logic for this simple example.
I can make a pull request on gen 6 but it would change a few of the current functionalities.
for example, reward use pos_duration
stat which is redundant in this new logic (reward get calculated fully on each skip_frame
). This might break functionality on dev made on top of gen 6.
plus I didn't maintained unused functionalities, e.g. get_broker_max_unrealized_pnl
or I could create a new version? gen 7?
or I could create a new version? gen 7? yes, it is logical to separate it as research evolution.
@Kismuz,
I'm currently exploring the code that generate the broker datalines. My aim is to expend the stats to include a basic measurement for short trading.
When testing the output of the borker stats I came across a few problems that made me think that maybe this part of the code needs a rework:
pnl = (current_value - self.realized_broker_value) * normalizer
but...self.realized_broker_value = self.broker.get_value()
which I think actually mean value_before - value_after. it should be something like: unrealize = value - cash - realize