evolaemp / online_cognacy_ident

Fast and unsupervised methods for multilingual cognate clustering (Rama, Wahle, Sofroniev, and Jäger)
MIT License
1 stars 1 forks source link

No run_all.fish #6

Open Anaphory opened 5 years ago

Anaphory commented 5 years ago

The README talks about some fish-shell script https://github.com/evolaemp/online_cognacy_ident/blob/3b998aeaee3e0058da5d758cefb54bc2e37a2e94/README.md#L100 which does not exist. I have taken the steps to convert scripts/reproduce.fish into zsh (which is mostly the cryptic line

if [[ ${ipa_datasets[(ie)$dataset]} -le ${#ipa_datasets} ]]

instead of the nice

if contains $dataset $ipa_datasets

but at least that works).

However, I noticed that that file uses the same training data training_data/asjpv17_word_pairs.txt on all datasets. Is that the intended use case?

PhyloStar commented 5 years ago

You are right. We dont have to use the same training data for all datasets. The algorithm should work for any dataset unsupervisedly.

Anaphory commented 5 years ago

Cool! I have it running, just fiddling with it to get it to play nicely with CLDF and non-ASJP sound classes. I'll tell you what my results are when I have them!