clics / clicsbp

CLDF dataset on Body Part Colexifications
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Concept Lists #4

Closed LinguList closed 2 years ago

LinguList commented 3 years ago

I suggest the following concept lists for testing:

  1. emotions (Jackson-2019-XX)
  2. color (not sure how to determine, maybe concepticon tags?)
  3. body part list (needs to be determined via coverage statistics)

The coverage statistics should count for all languages and language families, how many gaps we find in the data. It should be straightforward to do it in cltoolkit.

AnnikaTjuka commented 3 years ago

I had a look at the possible lists:

@LinguList Would it make sense if I created a new list of available concepts in the current Concepticon version and added a tag column? For example: ID CONCEPT TAG
18 EARLOBE bodypart
837 BLUE color
3749 HATE (LOATHING) emotion
LinguList commented 3 years ago

In fact, why not. This is useful metadata information. What we can also do, if you present this list as a blog post before, is computing COVERAGE with our current lexibank data collection. I'd then in time show you how to do so, but this means, you would have for each concept a direct overview on the number of languages, language families, and the state of the data (transcription, only orthographic form, etc.).

This would actually be anyway an important step here for the analysis. You'd also learn how to run CL Toolkit on all of the lexibank data (which is a bit time consuming, but useful), and we'd have a closer integration with pylexibank and could use this later to filter the data we want to use!

AnnikaTjuka commented 3 years ago

Cool! That's exactly what I was thinking of. I'll start working on the list and get back to you when it is ready in order to test the CL toolkit part.