Rachnog / Deep-Trading

Algorithmic trading with deep learning experiments
1.43k stars 695 forks source link

StandardScaler() for OHLCV data #3

Closed trevorwelch closed 6 years ago

trevorwelch commented 7 years ago

I'm working on a time-series classification on financial data (not regression, but similar).

I'm using sklearn.StandardScaler() although after reading all of your posts on Medium (thanks for the help!) I'm not entirely sure that I'm not screwing it up...

I'm doing something like this to create 'lagged' data for the time window I'm trying to classify:

def lag_data(df_data):
    for each in channels:
                features_to_add.append(pd.concat([
                df_data[[each]].shift(i).add_prefix("lag_{}_".format(i)) for i in range((lookforward*-1), lookback)], axis=1))
    return pd.concat(features_to_add, axis=1)
from from sklearn.preprocessing import StandardScaler
Instantiate scaler
scaler = StandardScaler()
# Scale the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(OHLCV_data_with_lag.values))

And then like this to prepare for Conv1D in Keras:


def X_to_Conv1D_arrays(X):
    # Convert X to 3D arrays
    X = np.array(X)

    # Reshape data for Conv1D
    X = X.reshape(X.shape[0], X.shape[1], 1)

    print("X: ", X.shape)
    print("X: ", type(X))

    return X

I've gotten some decent accuracy, but I'm wondering if this is a faulty way to prepare the data...

GarrisonD commented 6 years ago

@trevorwelch first of all, you can't use scaler.fit_transform(...) for whole OHLCV_data_with_lag - it's totally incorrect. You should use scaler.fit_transform(...) for train set and after that scaler.transform(...) for test set. Maybe it was a problem.