Closed Pirat83 closed 4 months ago
Hi @Pirat83,
Inside of your execution, you can check the bar's current date using ctx.date to see if your strategy should start trading. Does that help?
Well I already implemented this check. Two additional issues arise in this case.
1) Not all Indicators are initialized. Sometimes an ExecutionContext
is missing when trading. And until now I don't know why the bar data is missing at some points in time. Then the Strategy starts trading at a point somewhere in let's say April. This makes Strategies that rely on multiple instruments hard to implement. Let's say If RSI(20) of QQQ <= 30 buy SQQQ else buy TQQQ. In this situation shorting is an option. But in most other of my Strategies I don't have this option.
2) Some of the metrics require the correct start / end date
. This is nice to have for me because we will use Quant Stats anyway. The metrics also require the risk free rate to be correct. So this issue is on my to-do list and for me optional.
I will change the example in the GIT repository to reflect the changes.
It sounds like the problem you're having has to do with bars not being shared with all of your instruments. For instance, if you are using Strategy#set_before_exec or Strategy#set_after_exec, the ExecContexts will only be passed for instruments that have data for that bar.
Assuming that is true, there is not much PyBroker can do for you in that case since data is missing.
Yeah I have experimented with a "forward fill DataSource
adapter design pattern".
This is very complicated, since a RSI(20) on a QQQ is not comparable to a RSI(20) of an SPY if data is missing in one of those instruments in the last 20 candles. In this case we would need to take a RSI(19) for the other instrument. This is very complicated when you have multiple hunderts instuments in your strategy and also not very intuitive for the consumer, because the missing data on QQQ has impact on the RSI length of the SPY.
Hi @Pirat83,
Have you considered modifying your data in Pandas first to fix the missing data issues you have? You can then use the Pandas DataFrame as a DataSource.
HI @edtechre,
this is exacly what I have done (Don't be confused by the name of this adapter - in long term it should store and take data from Timescale DB):
from datetime import datetime
from typing import Optional
import pandas as pd
from pandas import DataFrame
from pybroker import DataCol
from pybroker.data import DataSource, Alpaca
from datetime import datetime
from typing import Optional
import pandas as pd
from pandas import DataFrame
from pybroker import DataCol
from pybroker.data import DataSource, Alpaca
class TimeScaleDBDataSource(DataSource):
def __init__(self, delegate: Alpaca = None):
super(TimeScaleDBDataSource, self).__init__()
import os
self.delegate = delegate if delegate is not None else Alpaca(os.getenv('ALPACA_KEY_ID'), os.getenv('ALPACA_SECRET'))
@staticmethod
def _fill_and_reset_index(group: DataFrame, start_date: datetime, end_date: datetime, timeframe: str):
interval = pd.Timedelta(minutes=15)
from pandas import DatetimeIndex
index: DatetimeIndex = pd.date_range(start=start_date, end=end_date, freq=interval, tz='US/Eastern')
group = group.reindex(index)
group = group.ffill()
group = group.bfill()
group['date'] = group.index
return group
def _fetch_data(self, symbols: frozenset[str], start_date: datetime, end_date: datetime, timeframe: Optional[str], adjust: Optional[str]) -> pd.DataFrame:
# noinspection PyProtectedMember
result: DataFrame = self.delegate._fetch_data(symbols, start_date, end_date, timeframe, adjust)
result = result.set_index(DataCol.DATE.value, drop=False)
result = result.groupby(DataCol.SYMBOL.value).apply(self._fill_and_reset_index, start_date, end_date, timeframe)
result = result.reset_index(drop=True)
return result
I use the delegate
(the original Alpaca#_fetch_data(...)
method) to get the Data from Alpaca.
Then I regroup the DataFrame
by symbol and create an syntetic index.
Forward filling and backward filling the DataFrame
is not the most acurate way to handle this when dealing with a production environment but it is IMHO good enought for a backtesting framework - compared to hazzeling around with diffenent indicator length or other more error prone "sollutions".
Backfilling the DataFrame
introduces a look ahead bias but knowing this data is used only to warmup indicators that are not really used since the warmup period is in my case substracted from the start_date
is is okay for me.
If any one has similar issues I can share that code. And I am still convinced that solving those two challenges by the PyBroker Framework would help may people.
Thank you for your help. I appriciate your work very much. I have investigated many backtesting frameworks and PyBroker is top in terms of quality architecture and the codebase.
Hi @Pirat83,
Can you clarify what the two challenges are and your proposed solution?
Hi @edtechre, yes of corse:
1) Alligning start_date
and end_date
to improve comparing multiple strategies with different indicator length:
See i.e: https://github.com/edtechre/pybroker/blob/master/src/pybroker/strategy.py#L226
Maybe this this property should have a different naming then warmup
. Doing so each Strategy
would start trading exactly on the same day (and therefore it is easier to compare multiple Strategies
) and indicators are warmed up before the start_date
of backtesting. Backtesting starts exacly at start_date
with all indicators warmed up.
Please keep in mind >=, > or < and <= I am not sure yet which one to choose to be consistent with the rest of PyBrokers architecture.
2) Forward Filling and BackFilling the Pandas Dataframe to make Indicator usage easier / constistent when data is not present
Simply add https://github.com/edtechre/pybroker/blob/master/src/pybroker/data.py#L389 a ffill and bfill:
@staticmethod
def _fill_and_reset_index(group: DataFrame, start_date: datetime, end_date: datetime, timeframe: str):
interval = pd.Timedelta(timeframe)
from pandas import DatetimeIndex
index: DatetimeIndex = pd.date_range(start_date, end_date, freq=interval, tz='US/Eastern')
group = group.reindex(index)
group = group.ffill()
group = group.bfill()
group['date'] = group.index
return group
def _fetch_data(self, symbols: frozenset[str], start_date: datetime, end_date: datetime, timeframe: Optional[str], adjust: Optional[str]) -> pd.DataFrame:
# noinspection PyProtectedMember
result: DataFrame = self.delegate._fetch_data(symbols, start_date, end_date, timeframe, adjust)
result = result.set_index(DataCol.DATE.value)
result = result.groupby(DataCol.SYMBOL.value).apply(self._fill_and_reset_index, start_date, end_date, timeframe)
result = result.reset_index(drop=True)
return result
I would strongly suggest to add an feature toogle to StrategyConfig
, so people can decide if they want to use this. I.e bbfill adds data that has not existed in reality in such a way. And ffill need to be handled in a live trading environment anyway.
This change could be done also in the other DataSource
s if there is the same challange like with Alpaca.
I have found an issue in my code:
group = group.reindex(index)
this needs better validation. It set's the whole group to Nan, when start_date
/ end_date
does not start at 0:00h on a daily timeframe.
Thank you for your thoughtful input, @Pirat83.
Alligning start_date and end_date to improve comparing multiple strategies with different indicator length
Can you explain what you mean by different indicator length here? Also what you mean by start_date_trade, end_date_trade?
Hi @edtechre, Thank you for your time:
I need to select one Strategy
to fit to the market conditions. Therefore I need to make all my strategies comparable across each other. I achieved this by skipping candles and then start backtesting from exactly the same candle.
Let's compare 2 Strategies
:
One uses the RSI 14 and one the RSI 21. And I am interested in the results of each Strategy
per candle. We will start with the first trading day of the year this was the 2023-01-03.
In reality the Strategies
would be much more complicated but for simplicity let's make it simple:
1) RSI 14:
The RSI 14 Strategy
requires 14 candles warmup to calculate the RSI(14). So It needs data from the 2022-12-09 until the 2022-12-30 to warm up. And then we can use the RSI value on the 2023-01-03, which is the first trading day in the new year.
2) RSI 21: The RSI 21 'Strategy' would need 21 candles and it needs the data from the 2022-11-30 to the 2022-12-2022 to complete the warmup and start trading on the 2023-01-03.
After the trading day 2023-01-03 I want store the metrics and the Portfolio
in my TimeScaleDB and then I need to decide which Strategy
should be used next. That's the stuff we are talking about in the other tickets. But that's not part of this ticket.
PyBroker currently adds the warmup period to the start_date
instate of substracting it. So in the above 2 Strategies
our first trading day would be the 2023-01-25 if we use the RSI(14) and 2023-02-02 if we use RSI(21). This is highly counterintuitive and misleading and it makes comparing Strategies
much harder.
So I have created a workaround:
I have extended the Strategy
class and added start_date_trade
, and end_date_trade
.
So I can specify which candle sould be the start. Then I take all my indicators and calculate their lenght and then I use the maximum (and 30% buffer) to calculate the warmup
period of PyBroker. The Warmup period is substracted from my start_date_trade
to calculate PyBrokers start_date
.
This approach also requires filtering the method provided to 'Strategy#add_execution' since it is not very elegant but it does for the moment what it should do. If you are interested I can provide also some code snippets. This will not be executable code but it should be enough to find a better sollution.
One sollution would be to change PyBrokers behavior i.e. with a feature toggle (to ensure backward compatibilty) to substract the warmup period instate of adding it.
If you are interested I can provide also some code snippets. This will not be executable code but it should be enough to find a better sollution.
Yes, please share it!
One sollution would be to change PyBrokers behavior i.e. with a feature toggle (to ensure backward compatibilty) to substract the warmup period instate of adding it.
This makes sense to me, I can support this via an additional config option.
Hi @edtechre - I have added some example code to https://github.com/Pirat83/pybroker-experiments/tree/issue-69-and-51
Well in theory it is easy what I am doing. In practice it is a little bit complicated.
I have indicators on a daily chart i.e a SMA 200, SMA 50, SMA 20, etc... I take all indicators that I have and take the maximum - in this usecase 200. This value is stored in the warmup period variable. Then I substract the warmup from the start_date_time to fetch the data.
The Strategy#backtest(start_date=xxx, end_date=xxx, ...)
method should then filter the candles where backtesting should actualy take place - but it is not so easy and I needed to implement it by my self. See: https://github.com/edtechre/pybroker/issues/69#issuecomment-1868544466
Please correct me if I understood something wrong.
Thank you very much for your time and work.
Hi @Pirat83,
I had time to think about this more. Subtracting the warmup period will not work when querying data from a DataSource. The issue will be that the start_time
will be needed to fetch data from a remote data source (i.e. Yahoo Finance, Alpaca). But it won't be (easily) possible to know the new start time that is subtracted by the warmup period before the data is fetched from the data source. But the offset start time needs to be known in order to query the data source in the first place.
What I would suggest doing is querying the DataFrame from a DataSource, and then subtracting the warmup period from your intended start date to find the offset start_date
in the DataFrame to use for your backtest.
Hello @edtechre,
I have studied https://github.com/edtechre/pybroker/issues/48 a little bit more intensive. So from what I have understood is that there are 2 different
start_date
/end_date
combinatoins.Strategy(start_date, end_date)
-> Is used to define the inteval of the data that is fetchedStrategy#backtest(start_date, end_date)
-> Is used to defines backtest intevalThe
Strategy#backtest(..., warmup)
parameter is added to thestart_date
and then theStrategy
starts trading fromstart_date
+warump
untilend_date
.Here an example with the warmup period of 5 daily bars: https://github.com/Pirat83/pybroker-experiments/blob/master/main.py
My chalenge is to make multiple strategies comparable. So they should all start trading on the 2023-01-01 regardless of there warmup period.
So it is a little bit complicated for me to calculate the concrete
start_date
-warmup
so that the first trades are done exaclty on the 2023-01-01. On the daily timeframe we just have vacation days, weekend and days where the stock exchanges simply are closed. But thing start to get very messy if I want to multiply the indicator values by an timeframe multiplier to apply the daily logic on a lower timframe (i.e 390 when trading 1 min bars). In this scenario things get very complicated.So is there an option to solve my issue without requiring a business day calendar and a list of days when the stock exchanges openend or not?
I think once the data is read the warmup period can be substracted and all indicators can be calculated. Doing so everything would be waruped before the
Strategy.backtest(start_date, ....)
and the first trades could happen on this date (if there is a day where the stock exchanges have opened - otherwise the next candle could be taken to start). In the example above the 2022-01-03 would be the date to start trading.Ideally we could add an additional param to
StrategyConfig
to keep the APIs and the behavior backward compatible (if desiered). I hope that calculating the warumup then would be much easier (for me) and backtesting results would be more comparable.What do you think about this idea? Or maybe there is an easy sollution for this, which I did not find until now?
Thank you for your time and your effort. I rally like your work.