clics / clicsbp

CLDF dataset on Body Part Colexifications
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Run colexification analysis on one family only with extra code and the framework discussed. #8

Closed LinguList closed 2 years ago

LinguList commented 2 years ago

And also run the clustering/infomap etc.

LinguList commented 2 years ago

@AnnikaTjuka, you can run this analysis now as:

$ cldfbench clicsbp.colexifications

Parameters refer to the threshold for cognate detection:

$ cldfbench clicsbp.colexifications --acd-threshold=0.25

Let us use this thread to discuss more parameters.

LinguList commented 2 years ago

Output is the file output/colexifications.tsv. It contains infomap clusters. Here, 0 is used to mark missing data, with respect to concepts, and links are annotated in a similar way in which we annotated them for some datasets in norare, so we can also store the network in this format, at least to some degree.

LinguList commented 2 years ago

Currently, we only run the analysis for "body parts", but if you check the following line:

https://github.com/clics/clicsbp/blob/b279c7ba2db943dd583ae40e3ca2899f41a5f40c/clicsbpcommands/colexifications.py#L157

you can see that you can include separate analyses for color.

AnnikaTjuka commented 2 years ago

I think I'm missing something. The command cldfbench clicsbp.colexifications isn't working although I upgraded all relevant packages. I get the following error message:

 UserWarning: ImportError loading entry point clicsbp
  warnings.warn('ImportError loading entry point {0.name}'.format(ep))
usage: cldfbench [-h] [--log-level LOG_LEVEL] [-z FIELD_SIZE_LIMIT]
                 COMMAND ...
cldfbench: error: argument COMMAND: invalid choice: 'clicsbp.colexifications' (choose from 'catconfig', 'catinfo', 'catupdate', 'check', 'ci', 'cldfreadme', 'diff', 'download', 'geojson', 'info', 'ls', 'makecldf', 'media', 'new', 'readme', 'run', 'stub', 'zenodo', 'lexibank.check', 'lexibank.check_languages', 'lexibank.check_lexibank', 'lexibank.check_phonotactics', 'lexibank.check_profile', 'lexibank.db', 'lexibank.format_profile', 'lexibank.init_profile', 'lexibank.load', 'lexibank.ls', 'lexibank.makecldf', 'lexibank.readme', 'lexibank.unload', 'hsiuhmongmien.structure', 'zenodo.download', 'cldfviz.htmlmap', 'cldfviz.map')
LinguList commented 2 years ago

Can you check again? I just updated the code (also adding more families).

AnnikaTjuka commented 2 years ago

Is it possible that I need to additionally define --entry-point ENTRY_POINT? Is this like the --repos argument for Concepticon?

AnnikaTjuka commented 2 years ago

I tried uninstalling and installing cldfbench. But still the same error message.

LinguList commented 2 years ago

You installed the package, right?

$ cd clicsbp
$ pip install -e .
AnnikaTjuka commented 2 years ago

Fixed it! It was a mismatch of dependencies with networkx:

pyclics 3.0.1 requires networkx==2.1, but lingpy 2.6.8 requires networkx>=2.3

I now upgraded networkx to 2.3 and it worked.

LinguList commented 2 years ago

Good. pyclics won't be needed as a dependency for now. I started from a fresh venv, so this problem did not occur. You will see that I actually added more datasets already. It would be important to check a bit more closely, which families we want to use, etc. But for now, the data we have should be sufficient, even if it does not always give us enough data for emotion.

AnnikaTjuka commented 2 years ago

Ah ok, I used my clics environment so pyclics was already in there. I'll set up a new clicsbp environment and test a bit more.