bheinzerling / selpref-emb

Selectional Preference Embeddings (EMNLP 2017)
MIT License
4 stars 2 forks source link

Cannot reproduce result in the paper #1

Open LuxunXu opened 4 years ago

LuxunXu commented 4 years ago

In the paper, it mentioned the cosine similarity between (Titanic, sank@nsubj) is 0.11 while the similarity of (iceberg, sank@nsubj) is -0.005. However, using all 6 models provided, I could not achieve the same number. What is the issue? emb.similarity('titanic', 'sink@nsubj') emb.similarity('iceberg', 'sink@nsubj')

bheinzerling commented 4 years ago

Thanks for trying out the embeddings, I think you might be the first to do so ;-) For the paper we tested quite a few different preprocessing methods, like cased vs. uncased, lemmatized vs. not lemmatized, normalizing embeddings vs. not normalizing etc., but did not upload all of those versions. I think the similarities you've mentioned were calculated with cased, lemmatized, not-normalized embeddings which are not among the ones linked in this repository. Unfortunately, I myself cannot check this either, since I've moved to a different institution since then.