clics / clicsbp

CLDF dataset on Body Part Colexifications
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Madv #28

Closed LinguList closed 2 years ago

LinguList commented 2 years ago

This is a first test of the coverage data.

The workflow now is a bit circular, but hard to avoid:

  1. load data to CLDF
  2. decide about our coverage subset (concepts, languages by glottocodes, sorted by number of concepts, only one variety per glottocode)
  3. make a new concepts.tsv and a new languages.tsv in etc
  4. re-run data and this time filter by languages and concepts

Alternatively, the whole data creation with makecldf can be done by running this very coverage analysis. This would mean that makecldf runs quite a long time. But on the other hand: we'd only have to run it once, and if we KNOW what coverage we aim for, we can reduce the iteration, which is quite extensive now, but can be reduced then to selecting 1000 concepts and 1500 languages or similar.