cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

Training an autoencoder taking long time #63

Open raouflamari opened 6 years ago

raouflamari commented 6 years ago

hi there, I am trying to implement an autoencoder to reconstruct 76 input features.

Here is my code:

features = ['f1', 'f2', ...., 'f76']
assembler = VectorAssembler(inputCols=features, outputCol="features")
dataset = assembler.transform(df)
scaler = MinMaxScaler(inputCol="features", outputCol="features_scaled")
scaler_model = scaler.fit(dataset)
dataset = scaler_model.transform(dataset)

nb_features = len(features)
model = Sequential()
model.add(Dense(50, activation='relu', input_shape=(nb_features,)))
model.add(Dense(nb_features, activation='sigmoid'))
model.summary()

Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 50)                3850      
_________________________________________________________________
dense_2 (Dense)              (None, 76)                3876      
=================================================================
Total params: 7,726
Trainable params: 7,726
Non-trainable params: 0
_________________________________________________________________

trainer = SingleTrainer(keras_model=model, worker_optimizer="adam",
                        loss="mae", features_col="features_scaled",
                        label_col="features_scaled", num_epoch=5, batch_size=32)
trained_model = trainer.train(dataset)

The training is taking more than 10 hours and still running! Am I missing some thing?