Keras implementation? - Githubissues

Hello,

Using this implementation with a sample of my own data, I got a 81% accuracy score on my validation set, which was pretty great. I then tried moving to my whole data, but apparently this implementation is a bit heavy handed when it comes to memory usage, especially when checkpointing.

So I decided to try implementing this network using Keras, here is the implementation I use

filter_sizes = [3, 4, 5]
num_filters = 128
hidden_dims = 50
dropout_prob = 0.5

model_input = Input(shape=(max_sentence_len,), dtype='int32')

z =Embedding(word_embeddings.shape[0],
            word_embeddings.shape[1],
            input_length=max_sentence_len,
            weights=[word_embeddings],
            trainable=False)(model_input)

z = Dropout(dropout_prob)(z)

# Convolutional block
conv_blocks = []
for sz in filter_sizes:
    conv = Convolution1D(filters=num_filters,
                         kernel_size=sz,
                         padding="valid",
                         activation="relu",
                         strides=1)(z)
    conv = MaxPooling1D(pool_size=2)(conv)
    conv = Flatten()(conv)
    conv_blocks.append(conv)
z = Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]

z = Dropout(dropout_prob)(z)
z = Dense(hidden_dims, activation="relu")(z)
model_output = Dense(1, activation="sigmoid")(z)

model = Model(model_input, model_output)
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

# Train the model
hist = model.fit(X_train, y_train, batch_size=64, epochs=10,
          validation_split=0.05, verbose=1)

The way the embedding works is a bit different here, instead of storing the whole embeddings for every word of every sentence I store the embedding files indexes of the words of a sentence, which keras then looks up when it needs to recuperate the actual embedding vector for that word. I also use pretrained embeddings, which I also did in the code of this repository

So it is more memory efficient, however the accuracy is way worse, 71% on the validation set after 10 epochs and 79% on the training set. The implementation on this repo converges way faster to values above 80

Could someone more familiar with Keras than me tell me if I did a mistake in my CNN? Or has someone implemented a more memory efficient version of the code on this repository and would be willing to share it?

dennybritz / cnn-text-classification-tf

Keras implementation? #124