clics / pyclics

python package implementing the CLICS processing workflow
Apache License 2.0
3 stars 0 forks source link

Best coverage subsets for three varying numbers of datasets #18

Open LinguList opened 6 years ago

LinguList commented 6 years ago

If we follow the plan to offer three different networks, namely one high-coverage with many languages and, say 300 concepts, one with less languages, but more concepts of, say 600 concepts, and one with the maximum we can get, we need to use the coverage code in lingpy to account for this.

This code is now straightforward, but the question is: do we still and actually need this, or do we rather just take the full dump of 2000 concepts? Given that we know the frequency of each concept IN CLICS, we can easily even visualize this by showing the size. And the communities still make sense, so far, we do not suffer from skewed data...

xrotwang commented 5 years ago

I think this could be solved by adding some sort of frequency (percentage of languages having a counterpart for a concept) measure to the concept labels (or using the frequency for bubble size in the visualization).