concepticon / norare-data

Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts
Other
15 stars 1 forks source link

MRC psycholinguistic database #2

Open LinguList opened 4 years ago

LinguList commented 4 years ago

The data is here:

https://github.com/samzhang111/mrc-psycholinguistics/

It's structure is a bit complicated, but I think that it can be used for our purpose, and it would be good to have a complete update of the data.

We should, as we did in the past, do an automated mapping, of course.

AnnikaTjuka commented 4 years ago

I had another look at the purpose of the MRC and I am not sure if we should include the database in NoRaRe.

"It is designed to be of use to psycholinguists in selecting stimulus materials for testing; for use by researchers in Artificial Intelligence as a source of information required for natural language processing and cognitive simulation; and for use by computer scientists who wish to use the word lists and syntactic information in the design of text processors."

If we want to differentiate our data from other resources that offer "stimulus generators" it might not be necessary to add this dataset. What do you think @LinguList ?

LinguList commented 4 years ago

As they have frequencies etc., and we have it already in concepticon, I'd still add it, also due to its long-standing role as a database for some kind of norms. In fact: all current norare we have in concepticon should still be added, this includes also datasets that have a slightly different purpose, such as wikipedia, or babelphy or wordnett. I would still add those (and I should probably start adding them), because they provide relations in the broader sense: wordnet is a clear relation, as is wikipedia, with its source on categories. We'll see how feasible things are. If I get stuck in parts, I'd then rather leave it...

AnnikaTjuka commented 4 years ago

That sounds like a good solution. And we already have 29 datasets for NoRaRe ;-)