Open LuxunXu opened 4 years ago
Thanks for trying out the embeddings, I think you might be the first to do so ;-) For the paper we tested quite a few different preprocessing methods, like cased vs. uncased, lemmatized vs. not lemmatized, normalizing embeddings vs. not normalizing etc., but did not upload all of those versions. I think the similarities you've mentioned were calculated with cased, lemmatized, not-normalized embeddings which are not among the ones linked in this repository. Unfortunately, I myself cannot check this either, since I've moved to a different institution since then.
In the paper, it mentioned the cosine similarity between (Titanic, sank@nsubj) is 0.11 while the similarity of (iceberg, sank@nsubj) is -0.005. However, using all 6 models provided, I could not achieve the same number. What is the issue?
emb.similarity('titanic', 'sink@nsubj') emb.similarity('iceberg', 'sink@nsubj')