Closed mashaye closed 7 years ago
How did you use the ReshapeTransformer exactly? This should work, since in the backend it is using np.reshape
.
First of all I follow your example in workflow.ipynb and I tried different configurations like: reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (30, 40, 1)) reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (1, 30, 40, 1))
My convolotional layer is as: model.add(Conv2D(50, (4,4), activation='relu',input_shape=(30,40,1)))
It gives me the following error: Error when checking : expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 1200)
Just to be sure, you ran dataframe = reshape_transformer.transform(dataframe)
and supplied as input column for the training matrix
right?
Joeri
Yes. Here is part of my code:
raw_dataset = reader.read.format('com.databricks.spark.csv').options(header='true', inferSchema='true').load("data/data_distKeras.csv") features = raw_dataset.columns features.remove('Label') vector_assembler = VectorAssembler(inputCols=features, outputCol="features") dataset = vector_assembler.transform(raw_dataset)
standard_scaler = StandardScaler(inputCol="features", outputCol="features_normalized", withStd=True, withMean=True) standard_scaler_model = standard_scaler.fit(dataset) dataset = standard_scaler_model.transform(dataset)
label_indexer = StringIndexer(inputCol="Label", outputCol="label_index").fit(dataset) dataset = label_indexer.transform(dataset) nb_classes = 3 # Number of output classes nb_features = len(features) transformer = OneHotTransformer(output_dim=nb_classes, input_col="label_index", output_col="label2") dataset = transformer.transform(dataset)
dataset = dataset.select("features_normalized", "label_index", "label2")
reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (30, 40, 1)) dataset_train = reshape_transformer.transform(dataset)
Could you run the following:
d = dataset_train.select("features_normalized", "matrix").take(1)[0]
fn = np.asarray(d['features_normalized'])
m = np.asarray(d['matrix'])
print("Shape features normalized: " + str(fn.shape))
print("Shape matrix: " + str(m.shape))
Thanks for your time. Actually I found the bug in my code and now it is working fine.
Ok, no problem. Let me know if you found something else.
I have convolotional layers in my network and I reshape my data set in Keras like this: dataX=np.reshape(data, (data.shape[0],30,40,1))
When I wanted to use dist-keras, I tried to use ReshapeTransformer to reshape the data, but not successful. I know that one way is to convert the spark dataframe to pandas dataframe and do reshaping and again convert it to RDDs. However, I am looking for a better way. Do you have any solution for it?
Thanks,