Open heinen opened 4 years ago
Not sure if this is related but Kama has this line: kama = self._check_fillna(kama, value=self._close)
here: https://github.com/bukosabino/ta/blob/master/ta/momentum.py#L296
Most other functions pass 0
or 1
for the value
param of _check_fillna
, whereas Kama passes the entire close
series.
_check_fillna
uses ffill
: https://github.com/bukosabino/ta/blob/master/ta/utils.py#L23, which in a timeseries, should take the earliest known value and propagate that forward. So, it's possible there's a bug, but it also seems possible that it's pushing past data into the future, rather than bringing future data into the present.
Also, the comment here (https://github.com/bukosabino/ta/blob/master/ta/utils.py#L14) about a -1
value triggering bfill
doesn't seem accurate. According to https://github.com/bukosabino/ta/blob/master/ta/utils.py#L23 it looks like it simply replaces empty values with -1
. Regardless, backfilling data if it was happening would seem to potentially be "looking ahead" at closing prices by bringing future data into the past.
Interestingly, ichimoku
indicators use a -1
value for _check_fillna
as seen here: https://github.com/bukosabino/ta/blob/master/ta/trend.py#L340. However, if fillna
is set to False
, these values shouldn't even matter.
Interesting bug though unless I'm missing something obvious.
Sounds very serious. Unintentional forward looking is the worst kind of bug in this type of library, IMHO. I haven't looked into the ones you mention, nor any other, but I know that the visual Ichimoku is supposed to look into the future. It's controlled by the visual
argument, which shifts n2
if True (default is False).
EDIT: for anyone who might be interested, I implemented a simple check based on @heinen's code above in my pipeline:
def perform_TIs_forward_lookingness_check(df):
""" Check for potential forward lookingness of TIs """
test_len = 5000
df_small = df.iloc[:test_len*2].copy()
df_smaller = df.iloc[:test_len].copy()
df_small = add_technical_features(df=df_small)
df_smaller = add_technical_features(df=df_smaller)
ti_cols = [col for col in df_small.columns if 'ti_' in col]
for ti_col in ti_cols:
assert df_smaller[ti_col].equals(df_small[ti_col].iloc[:test_len]), f'TI {ti_col} potentially forward looking'
return
where function add_technicalfeatures appends new columns prefixed with "ti" to an OHLC df. As an example, the assert fails on ta.trend.IchimokuIndicator.ichimoku_a()
if visual = True
but not for visual=False
.
I am personally very surprised to notice that the Ichimoku indicator has a serious data leak problem and it defeats the purpose of using a library like this. Given that feature importance may reveal an issue like this, what happens if there are other data leak issues in this library? This may kill an entire research effort.
In addition to Ichimoku the DPO indicator is also leaking data. But how do we know there is not more leakage in the rest of the indicators?
There are few hidden gems in this library. Some I could find:
Elements that roll beyond the last position are re-introduced at the first.
It might be nice to have a MLStrategy
as a default without any data leaks (e.g. kst/kama/ichimoku/dpo excluded or reconfigured).
Hey folks,
Either I don't understand how ta works or I've found a bug. I would like to calculate a history of all technical indicators for a lot of stocks. It seems, though, as if ta also looks into the future for some of the indicators calculated in
add_all_ta_features
. I've constructed a minimum example to illustrate my problem usingta.momentum.kama
:This assertion fails. As far as I can see, the only way for it to fail is if kama looks at close values in the future. Am I missing something here? I.e.: Are technical indicators something else than different kinds of moving averages over the past? When using
add_all_ta_features
, I have similar issues with volume_vpt, trend_vortex_ind_pos, trend_vortex_ind_neg, trend_vortex_ind_diff, trend_trix, trend_dpo, trend_kst, trend_kst_sig, trend_kst_diff, trend_visual_ichimoku_a, trend_visual_ichimoku_b, others_dr.Interestingly, when setting
fillna=False
, the above example does not fail the assertion. When usingadd_all_ta_features
withfillna=False
, though, I still have the same we-seem-to-be-looking-into-the-future-problem but with different features (tavolume_vpt, ta__trend_kst, tatrend_kst_sig, tatrend_kst_diff, tatrend_visual_ichimoku_a, tatrend_visual_ichimoku_b, ta__momentum_kama, taothers_dr). A minimal example for those is a bit more complex to construct but possible if necessary.