Closed ianjoo closed 3 years ago
What you can do, for instance, while building the colexification network (see clics colexification -h
) is redirecting the output to a file, for example:
clics -t 3 -f families colexification --show 3000 --format tsv > out.tsv
Thanks. But why 3000? What is the total number?
You can also just look at the GML file in Python, which you can load with networkx or with igraph in Python (also R) in order to browse across all links in the network (which also have metadata), so you should be able to access all colexifications.
Additionally, please check if this post by @tresoldi is useful as it treats working with CLICS data from within Python (with code published on Zenodo): https://calc.hypotheses.org/2552
Thanks. But why 3000? What is the total number?
No particular reason other than that there are roughly 3000 concepts in CLICS and that, generally speaking, the less frequent colexifications also tend to be less reliable (However, of course note that number of concepts != to the number of colexifications in CLICS). network-3-families.gml
in total has 4228 edges (note that this is before clustering with infomap), so in total there would be 4228 colexifications. The blog post that Mattis mentioned is a very good introduction to programmatically accessing the network data. Here's also a small snippet that shows how to access the data using igraph
.
Note that the snippet is also based on @LinguList and @tresoldi's blog postings.
There is also some code from the "semantic distance" that I present at SLE2019 and discussed in another CALC blog post: https://github.com/tresoldi/semantic_distance
I think what you want is something similar to the full list ( https://github.com/tresoldi/semantic_distance/blob/master/data/colexifications.tsv ), but you should really compute it yourself, and @chrzyki 's snippet is clear. The data in this repository is outdated and includes all possible colexifications, including those found only between a single pair of languages, so that you have a lot of noise in there.
Closing this for now. Feel free to reopen should any other questions arise.
What is the terminal command that allows me to download all the colexifications, containing: