cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

Difficulties with Convolotional Networks #18

Closed mashaye closed 7 years ago

mashaye commented 7 years ago

I have convolotional layers in my network and I reshape my data set in Keras like this: dataX=np.reshape(data, (data.shape[0],30,40,1))

When I wanted to use dist-keras, I tried to use ReshapeTransformer to reshape the data, but not successful. I know that one way is to convert the spark dataframe to pandas dataframe and do reshaping and again convert it to RDDs. However, I am looking for a better way. Do you have any solution for it?

Thanks,

JoeriHermans commented 7 years ago

How did you use the ReshapeTransformer exactly? This should work, since in the backend it is using np.reshape.

mashaye commented 7 years ago

First of all I follow your example in workflow.ipynb and I tried different configurations like: reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (30, 40, 1)) reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (1, 30, 40, 1))

My convolotional layer is as: model.add(Conv2D(50, (4,4), activation='relu',input_shape=(30,40,1)))

It gives me the following error: Error when checking : expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 1200)

JoeriHermans commented 7 years ago

Just to be sure, you ran dataframe = reshape_transformer.transform(dataframe) and supplied as input column for the training matrix right?

Joeri

mashaye commented 7 years ago

Yes. Here is part of my code:

raw_dataset = reader.read.format('com.databricks.spark.csv').options(header='true', inferSchema='true').load("data/data_distKeras.csv") features = raw_dataset.columns features.remove('Label') vector_assembler = VectorAssembler(inputCols=features, outputCol="features") dataset = vector_assembler.transform(raw_dataset)

standard_scaler = StandardScaler(inputCol="features", outputCol="features_normalized", withStd=True, withMean=True) standard_scaler_model = standard_scaler.fit(dataset) dataset = standard_scaler_model.transform(dataset)

label_indexer = StringIndexer(inputCol="Label", outputCol="label_index").fit(dataset) dataset = label_indexer.transform(dataset) nb_classes = 3 # Number of output classes nb_features = len(features) transformer = OneHotTransformer(output_dim=nb_classes, input_col="label_index", output_col="label2") dataset = transformer.transform(dataset)

dataset = dataset.select("features_normalized", "label_index", "label2")

reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (30, 40, 1)) dataset_train = reshape_transformer.transform(dataset)

JoeriHermans commented 7 years ago

Could you run the following:

d = dataset_train.select("features_normalized", "matrix").take(1)[0]

fn = np.asarray(d['features_normalized'])
m = np.asarray(d['matrix'])

print("Shape features normalized: " + str(fn.shape))
print("Shape matrix: " + str(m.shape))
mashaye commented 7 years ago

Thanks for your time. Actually I found the bug in my code and now it is working fine.

JoeriHermans commented 7 years ago

Ok, no problem. Let me know if you found something else.