XifengGuo / DEC-keras

Keras implementation for Deep Embedding Clustering (DEC)
MIT License
477 stars 162 forks source link

How to feed TFRecord data (over 60GB) to the DEC-keras model? #26

Open wangjiangyuan opened 3 years ago

wangjiangyuan commented 3 years ago

Thanks for your great implementation! I' ve tried to solve classification problem whose input data have the shape of 1000*221 by DEC model. I want to train my data with over 80 thousand data (very large size [8000000,1000,221],dtype=float32 about 60GB ), so it's not possible load whole dataset into python array. After googling, I found tf.TFRecord helps me to get out this capacity problem.

I followed the tutorial in the official TensorFlow site to write TFRecord file and I can load the TFReocrd into the conventional Keras Model. However, I can't find how to feed into the DEC-model. The input (mnist) of DEC-model is one numpy file that has the shape [70000,784].

Like flowing:

dataset = tf.data.TFRecordDataset(filenames=[filenames]) parsed_dataset = dataset.map(_parse_function, num_parallel_calls=8) final_dataset = parsed_dataset.shuffle(buffer_size=number_of_sample).batch(10) iterator = dataset.make_one_shot_iterator() parsed_record = iterator.get_next() feature, label = parsed_record['feature'], parsed_record['label']

keras

inputs = keras.Input(shape=(1000,221 ), name='feature', tensor=feature) model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss='categorical_crossentropy', metrics=['accuracy','categorical_crossentropy'], target_tensors=[label] ) train_model.fit(epochs= 30, steps_per_epoch= 800000/256)