Closed dr-slurp closed 5 years ago
Or if you could point me to the code that actually parses the vectors, I could probably figure it out myself.
Thanks
Here's the relevant part that saves out the binary vectors:
However, I've been refactoring the library (see #77) and the vectors are now stored using spaCy's Vectors
, which are serialized as numpy arrays. Relevant part of the code is here:
This will probably also make it easier to write your loader in Java. All other data (frequency counts, strings, config) will be stored as JSON btw.
I would appreciate the Java solution if you have it.
Hey!
I'm wondering how the reddit_vectors.bin is formatted? I want to build a tool that can read the sense2vec reddit vectors but in Java (as the rest of my pipeline is in Java). I'm having trouble decoding the binary so I'd appreciate any hints as to how the vectors are stored in the binary. Is there a plain text version of the vectors available?
Thanks in advance