carpedm20 / lstm-char-cnn-tensorflow

in progress
MIT License
761 stars 243 forks source link

Condition never ran #4

Closed indiejoseph closed 8 years ago

indiejoseph commented 8 years ago

Line 159 in batch_loader.py

if len(char) == max_word_length:
  chars[-1] = char2idx['}']
carpedm20 commented 8 years ago

That's because we set the max_word_length pretty large number which is 64. I remember that the longest word length in ptb dataset was around forty so this might not be a problem.

carpedm20 commented 8 years ago

Let's check and close this issue again.

indiejoseph commented 8 years ago

But the length of char is away 1, i think you attempt to check the length of chars instead.

carpedm20 commented 8 years ago

@indiejoseph Thanks!

indiejoseph commented 8 years ago

@carpedm20 And it should place before:

for idx in xrange(min(len(chars), max_word_length)):
  output_char[word_num][idx] = chars[idx]

Isn't it?

carpedm20 commented 8 years ago

@indiejoseph I think you are right but the original code has a same order https://github.com/yoonkim/lstm-char-cnn/blob/master/util/BatchLoaderUnk.lua#L185. @yoonkim should see this I guess because if

if len(char) == max_word_length:
  chars[-1] = char2idx['}']

is in below, it will have no affect to the code. But this is not a critical problem because the max_word_length is big enough and this code will not be executed in ptb dataset.