Closed elliottd closed 9 years ago
It's no longer clear to me that we should do this because it would have an unpredictable interaction with estimating a source language hidden vector for an image. However, we probably shouldn't add the word to the vocabulary if it is a singleton (especially if it's in the validation data).
Fixed. We don't yield a sentence if it's encoding would be [,
We have a problem when yielding training examples that contain only one word. If that word is not in the vocabulary then there is essentially nothing to learn and so the example should not be yielded.
Traceback (most recent call last): File "train.py", line 138, in
model.train_model()
File "train.py", line 64, in train_model
self.data_generator.yield_training_batch():
File "data_generator.py", line 146, in yield_training_batch
description.split())
File "data_generator.py", line 326, in format_sequence
seq_array)
AssertionError: time 0 sequence kaffeebohnen len w_indices 0 seq_array [[ 0. 1. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]