Heerozh / spectre

GPU-accelerated Factors analysis library and Backtester
GNU General Public License v3.0
642 stars 108 forks source link

how should OneHotEncoding be used? #7

Open flamz3d opened 4 years ago

flamz3d commented 4 years ago

hello, I'm trying to create a dataset and one feature I'd like to encode is WEEKDAY as a one-hot encoded vector

I tried: engine.add(factors.filter.OneHotEncoder(factors.WEEKDAY), "weekday") and engine.add(factors.WEEKDAY.one_hot(), "weekday") seems to be called and encoded properly, however I get an error saying factors cannot return multiple values.

What's the proper way to use the OneHotEncoder filter?

Heerozh commented 4 years ago

OneHotEncoder factor returns multiple values by class, for example, if you encoding [1,2,2,3], it will return 3 values: [1, 0, 0, 0], [0, 1, 1, 0], [0, 0, 0, 1]

So just use slice []:

onehots = factors.WEEKDAY.one_hot()
for i in range(5):
    engine.add(onehots[i], "weekday{}".format(i+1))

btw: other factors such as RollingLinearRegression also return multiple values (slope and intercept)