The workflow now is a bit circular, but hard to avoid:
load data to CLDF
decide about our coverage subset (concepts, languages by glottocodes, sorted by number of concepts, only one variety per glottocode)
make a new concepts.tsv and a new languages.tsv in etc
re-run data and this time filter by languages and concepts
Alternatively, the whole data creation with makecldf can be done by running this very coverage analysis. This would mean that makecldf runs quite a long time. But on the other hand: we'd only have to run it once, and if we KNOW what coverage we aim for, we can reduce the iteration, which is quite extensive now, but can be reduced then to selecting 1000 concepts and 1500 languages or similar.
This is a first test of the coverage data.
The workflow now is a bit circular, but hard to avoid:
etc
Alternatively, the whole data creation with makecldf can be done by running this very coverage analysis. This would mean that makecldf runs quite a long time. But on the other hand: we'd only have to run it once, and if we KNOW what coverage we aim for, we can reduce the iteration, which is quite extensive now, but can be reduced then to selecting 1000 concepts and 1500 languages or similar.