concepticon / norare-data

Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts
Other
15 stars 1 forks source link

Cross-Lingual Similarity Datasets #134

Open LinguList opened 3 years ago

LinguList commented 3 years ago

Reading the paper on multisimlex, I realized that there is some tradition to these datasets, although they are small and have nothing to do with historical linguistics or psychology: http://lcl.uniroma1.it/similarity-datasets/

Here is a paper on the topic: https://www.aclweb.org/anthology/P15-2001/

There are thus many more resources that could theoretically be united.

LinguList commented 3 years ago

There was even a semaval task in an ACL conference: https://alt.qcri.org/semeval2017/task2/index.php?id=data-and-tools

They offer also the test data. Maybe it is sufficient to add this one.

LinguList commented 3 years ago

The original article introducing this was apparently this one:

Original English RG-65 word similarity dataset: Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM, 8(10):627-633.

Online here: https://dl.acm.org/doi/10.1145/365628.365657