Open GoogleCodeExporter opened 8 years ago
I noticed this problem myself. In the file wordvectors.py on line 171 they read
an extra character after each vector. This just sends the first letter to
nowhere. If you comment this line out then it works i.e.
171: #fin.read(1) # newline
Original comment by matthewm...@gmail.com
on 31 Mar 2015 at 3:00
I noticed this problem myself. In the file wordvectors.py on line 171 they read
an extra character after each vector. This just sends the first letter to
nowhere. If you comment this line out then it works i.e.
171: #fin.read(1) # newline
The suggestion above worked for me. Note that you have to undo this if you were
to read binary files other than the google news one. If you continue with
commenting out the new line you will get corrupt vocabs like this :
vocab
-->
array([u'\nthe', u'\nof', u'\nand', u'\nto', u'\nin', u'\nor', u'\na',
u'\nfor', u'\nany', u'\nby', u'\nas', u'\nThe', u'\nbe', u'\nsuch',
u'\nshall', u'\nCompany', u'\nis', u'\non', u'\n._.'],
dtype='<U78')
Original comment by mtsh...@gmail.com
on 4 Jun 2015 at 2:40
Original issue reported on code.google.com by
moza...@gmail.com
on 15 Dec 2014 at 7:54