Closed prasys closed 4 years ago
Thanks a lot @amansrivastava17 . I appreciate the work that you and your team have put into in making an easy service for people to easily use it.
Cheers 👍
@all-contributors please add prasys for bug fix related UTF-8 Encoding For Glove Embedding
@amansrivastava17
I've put up a pull request to add @prasys! :tada:
Problem Statement : It looks like UTF-8 isn't being handled in Windows. By default , Windows uses Windows 1252 encoding , https://en.wikipedia.org/wiki/Windows-1252
Why does it happen This will cause the 'UnicodeDecodeError: 'charmap' codec can't decode byte 0x90' in Windows when you run Glove embedding and there are some UTF-8 words which windows cannot find it. Hence , the way to read the glove file is to make it explicit that it is UTF-8
what's the fix To make the file opening as explicit UTF-8 to handle this in Windows. No side effect on OSX/Linux (as I've tested it both)