Current readBinary counts the byte of a word using word.length.
This is not accurate when the word contains multibyte characters.
ex. "の" is a 3-byte character but "の".length equals 1.
Thus, when readBinary reads a binary model of multibyte words, it calculates the wrong offset and fails to load the model properly.
To fix this I replaced word.length with Buffer.from(word).byteLength. After this fix I succeeded to load my Japanese binary model generated by gensim (python).
Current
readBinary
counts the byte of a word usingword.length
.This is not accurate when the word contains multibyte characters.
ex.
"の"
is a 3-byte character but"の".length
equals 1.Thus, when
readBinary
reads a binary model of multibyte words, it calculates the wrong offset and fails to load the model properly.To fix this I replaced
word.length
withBuffer.from(word).byteLength
. After this fix I succeeded to load my Japanese binary model generated by gensim (python).