Closed indiejoseph closed 8 years ago
That's because we set the max_word_length pretty large number which is 64
. I remember that the longest word length in ptb dataset was around forty so this might not be a problem.
Let's check and close this issue again.
But the length of char
is away 1, i think you attempt to check the length of chars
instead.
@indiejoseph Thanks!
@carpedm20 And it should place before:
for idx in xrange(min(len(chars), max_word_length)):
output_char[word_num][idx] = chars[idx]
Isn't it?
@indiejoseph I think you are right but the original code has a same order https://github.com/yoonkim/lstm-char-cnn/blob/master/util/BatchLoaderUnk.lua#L185. @yoonkim should see this I guess because if
if len(char) == max_word_length:
chars[-1] = char2idx['}']
is in below, it will have no affect to the code. But this is not a critical problem because the max_word_length is big enough and this code will not be executed in ptb
dataset.
Line 159 in batch_loader.py