Open Anaphory opened 5 years ago
You are right. We dont have to use the same training data for all datasets. The algorithm should work for any dataset unsupervisedly.
Cool! I have it running, just fiddling with it to get it to play nicely with CLDF and non-ASJP sound classes. I'll tell you what my results are when I have them!
The README talks about some fish-shell script https://github.com/evolaemp/online_cognacy_ident/blob/3b998aeaee3e0058da5d758cefb54bc2e37a2e94/README.md#L100 which does not exist. I have taken the steps to convert
scripts/reproduce.fish
intozsh
(which is mostly the cryptic lineinstead of the nice
but at least that works).
However, I noticed that that file uses the same training data
training_data/asjpv17_word_pairs.txt
on all datasets. Is that the intended use case?