I am developing a neural network in order to classify timeseries data. I know for timeseries LSTM would be right approach but in dist-keras where before passing it to a trainer, data has to be in spark dataframe format.
I am following this example LSTM and task here is to port this example to dist-keras. Timestep is 50 which means model would take 0-49 and predict 50 and so on. As you can see in the example that data is being pre-processed using numpy before being fed to keras. Since dist-keras requires data to be in spark dataframe, I have to take a different approach which is as follows:
Now, each row of DF contains 2 columns. First column contains the features and second contains the label(_50 column which I want to train on and later predict on). As I see it, it become a classification problem. My issues are below:
If my approach is right, then how would I defines output label for my data as their are no finite number here for output column. It could be same number as number of rows in DF.
Do I still need LSTM layers in my model? I am asking this because I have processed the data in non-lstm kind of way.(At least that is what I think. I might be wrong.)
Please advice and let me know if you need more clarification or information on this.
I am developing a neural network in order to classify timeseries data. I know for timeseries LSTM would be right approach but in dist-keras where before passing it to a trainer, data has to be in spark dataframe format.
I am following this example LSTM and task here is to port this example to dist-keras. Timestep is 50 which means model would take 0-49 and predict 50 and so on. As you can see in the example that data is being pre-processed using numpy before being fed to keras. Since dist-keras requires data to be in spark dataframe, I have to take a different approach which is as follows:
I have straightaway created the DF:
X_train = train[:, :] y_train = train[:, -1] raw_dataset_train = sc.createDataFrame(X_train.tolist())
Above code will create a DF having 50 columns(timestep is 50) from 0 to _50.
Remove the _50 column which is the label in our case and then applying the vector assembler to all features:
features = raw_dataset_train.columns features.remove('_50') vector_assembler = VectorAssembler(inputCols=features, outputCol="features") dataset_train = vector_assembler.transform(raw_dataset_train)
Now, each row of DF contains 2 columns. First column contains the features and second contains the label(_50 column which I want to train on and later predict on). As I see it, it become a classification problem. My issues are below:
If my approach is right, then how would I defines output label for my data as their are no finite number here for output column. It could be same number as number of rows in DF.
Do I still need LSTM layers in my model? I am asking this because I have processed the data in non-lstm kind of way.(At least that is what I think. I might be wrong.)
Please advice and let me know if you need more clarification or information on this.