LoanDB / ronataswestoldturkic

CLDF dataset derived from 'West Old Turkic' by András Róna-Tas and Árpád Berta from 2011
https://www.harrassowitz-verlag.de/title_4002.ahtml
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Add clusterwise segmentation #6

Closed martino-vic closed 2 years ago

martino-vic commented 2 years ago

Currently segmentation happens through orthography.py. All it would take to apply the clusterwise segmentation is from ipatok import clusterise and to replace tokenise with clusterise in line 12. But I don't know how to make that column appear in forms.csv eventually

LinguList commented 2 years ago

See my PR #8 on this ;)

martino-vic commented 2 years ago

Has worked out fine, so I'm close this issue now