Open LinguList opened 4 years ago
I had another look at the purpose of the MRC and I am not sure if we should include the database in NoRaRe.
"It is designed to be of use to psycholinguists in selecting stimulus materials for testing; for use by researchers in Artificial Intelligence as a source of information required for natural language processing and cognitive simulation; and for use by computer scientists who wish to use the word lists and syntactic information in the design of text processors."
If we want to differentiate our data from other resources that offer "stimulus generators" it might not be necessary to add this dataset. What do you think @LinguList ?
As they have frequencies etc., and we have it already in concepticon, I'd still add it, also due to its long-standing role as a database for some kind of norms. In fact: all current norare we have in concepticon should still be added, this includes also datasets that have a slightly different purpose, such as wikipedia, or babelphy or wordnett. I would still add those (and I should probably start adding them), because they provide relations in the broader sense: wordnet is a clear relation, as is wikipedia, with its source on categories. We'll see how feasible things are. If I get stuck in parts, I'd then rather leave it...
That sounds like a good solution. And we already have 29 datasets for NoRaRe ;-)
The data is here:
https://github.com/samzhang111/mrc-psycholinguistics/
It's structure is a bit complicated, but I think that it can be used for our purpose, and it would be good to have a complete update of the data.
We should, as we did in the past, do an automated mapping, of course.