Using this implementation with a sample of my own data, I got a 81% accuracy score on my validation set, which was pretty great. I then tried moving to my whole data, but apparently this implementation is a bit heavy handed when it comes to memory usage, especially when checkpointing.
So I decided to try implementing this network using Keras, here is the implementation I use
filter_sizes = [3, 4, 5]
num_filters = 128
hidden_dims = 50
dropout_prob = 0.5
model_input = Input(shape=(max_sentence_len,), dtype='int32')
z =Embedding(word_embeddings.shape[0],
word_embeddings.shape[1],
input_length=max_sentence_len,
weights=[word_embeddings],
trainable=False)(model_input)
z = Dropout(dropout_prob)(z)
# Convolutional block
conv_blocks = []
for sz in filter_sizes:
conv = Convolution1D(filters=num_filters,
kernel_size=sz,
padding="valid",
activation="relu",
strides=1)(z)
conv = MaxPooling1D(pool_size=2)(conv)
conv = Flatten()(conv)
conv_blocks.append(conv)
z = Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]
z = Dropout(dropout_prob)(z)
z = Dense(hidden_dims, activation="relu")(z)
model_output = Dense(1, activation="sigmoid")(z)
model = Model(model_input, model_output)
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
# Train the model
hist = model.fit(X_train, y_train, batch_size=64, epochs=10,
validation_split=0.05, verbose=1)
The way the embedding works is a bit different here, instead of storing the whole embeddings for every word of every sentence I store the embedding files indexes of the words of a sentence, which keras then looks up when it needs to recuperate the actual embedding vector for that word.
I also use pretrained embeddings, which I also did in the code of this repository
So it is more memory efficient, however the accuracy is way worse, 71% on the validation set after 10 epochs and 79% on the training set. The implementation on this repo converges way faster to values above 80
Could someone more familiar with Keras than me tell me if I did a mistake in my CNN? Or has someone implemented a more memory efficient version of the code on this repository and would be willing to share it?
Hello,
Using this implementation with a sample of my own data, I got a 81% accuracy score on my validation set, which was pretty great. I then tried moving to my whole data, but apparently this implementation is a bit heavy handed when it comes to memory usage, especially when checkpointing.
So I decided to try implementing this network using Keras, here is the implementation I use
The way the embedding works is a bit different here, instead of storing the whole embeddings for every word of every sentence I store the embedding files indexes of the words of a sentence, which keras then looks up when it needs to recuperate the actual embedding vector for that word. I also use pretrained embeddings, which I also did in the code of this repository
So it is more memory efficient, however the accuracy is way worse, 71% on the validation set after 10 epochs and 79% on the training set. The implementation on this repo converges way faster to values above 80
Could someone more familiar with Keras than me tell me if I did a mistake in my CNN? Or has someone implemented a more memory efficient version of the code on this repository and would be willing to share it?