Maluuba / gensen

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Other
311 stars 43 forks source link

parsing of glove embeddings #8

Closed tilmanbeck closed 5 years ago

tilmanbeck commented 5 years ago

Hi,

thanks for the code! The provided glove2h5.py does not work on my machine as there are some words in the gloVe file which contain spaces and thus the code crashes when trying to convert the splitted lines to float. The following lines should be changed: vocab = [line[0] for line in glove_vectors] into vocab = [' '.join(line[0:-300]) for line in glove_vectors]

and vectors = np.array( [[float(val) for val in line[1:]] for line in glove_vectors] ).astype(np.float32) into vectors = np.array( [[float(val) for val in line[-300:]] for line in glove_vectors] ).astype(np.float32)

xingdi-eric-yuan commented 5 years ago

Hi @tilmanbeck , I encountered similar issue, I have modified what you mentioned and made it runnable on python3 and newer pytorch version. https://github.com/xingdi-eric-yuan/gensen/tree/py3pytorch.4