Closed fanavarro closed 3 years ago
Hi Francisco
With addition I added in the other issue, OWL2Vec can generate three files: .txt, .bin and .embeddings. To load the keyed vectors the best option is to use .embeddings.
Ernesto
Hi Ernesto, thanks again for your help. I was able to load the keyed vectors with:
KeyedVectors.load(datapath('output.embeddings'), mmap='r')
Greetings.
Hi, I've been playing a little bit with this amazing library by obtaining the embeddings as described in #2. The standalone application generates a txt and a bin file with the keyed vectors in textual and binary formats, respectively. In particular, I'm calculating the embeddings from the gene ontology, included in the repository.
Nonetheless, I'm experiencing several issues when I try to load the previously generated vectors. On the one hand, I tested the following python instruction:
KeyedVectors.load_word2vec_format(datapath('output.bin'), binary=True)
But I get an exception:
On the other hand, I also tried to load the txt file with:
KeyedVectors.load_word2vec_format(datapath('output.txt'), binary=False)
Obtaining the following exception:
I've been debugging the app, and I found that the incorrect line is the following one:
OBSOLETE. (Was not defined before being made obsolete). -0.35172817 0.9547997 -0.7017195 -0.022278534 -0.21855797 1.2328295 0.026366502 1.0293199 -0.42764938 -0.8031358 -0.7505182 -0.01582495 -1.4183652 0.68057406 0.22078635 0.75405 0.32506666 -1.7469246 0.62090874 0.33088538 0.32958925 -0.21696554 -0.99827904 -1.1616639 -1.3286982 0.89662665 -1.1478066 0.39570102 -0.28800654 0.6889498 1.2787603 1.2980725 -0.19311273 0.61996716 2.1367197 0.5362677 0.38471636 1.7419933 -0.2525881 -1.0632398 -0.23395675 0.9228735 1.0655191 -1.2626935 1.8425548 -0.2289917 -1.3743287 -1.0106764 1.1029646 0.26697654 -0.05864819 -0.5478173 -0.6971337 -1.7715415 0.2442582 -1.2734476 0.25903603 0.6714998 0.0923138 -0.70214653 -0.024936976 -1.3333995 -1.1616304 0.052265227 0.6952294 0.6618334 -0.9966148 1.3055371 2.9172845 1.5078834 2.4491236 -0.41737756 -0.8264428 1.9000809 -0.18261702 0.25123483 0.7783439 0.16481185 0.3635699 -0.29046142 0.54508567 1.2136813 -1.8205711 -1.4147732 0.719116 0.08283793 0.5585965 0.10322688 1.9780725 -1.2655574 0.51070905 -0.9030711 -0.94760007 1.2188694 1.1546952 -0.95993125 1.3770614 0.1960414 -1.4413091 0.20371768
I think that the load function expects a word followed by the vector but, in this case, I have several words. I am using gensim version 3.8.0 when using owl2vec* as well as loading the generated vectors. Doy you have any clue about why this line is included in the embedding files? Should I do some kind of ontology preprocessing, ie removing special characters, in order to avoid this?
Kind regards and thanks for your work.