b05102139 / dialectR

Doing dialectometry in R
4 stars 0 forks source link

Applying dialectR to morphological paradigms #1

Closed jhdeov closed 1 year ago

jhdeov commented 1 year ago

I'm curious how meaningful it would be to apply dialectR's edit distance to comparing the morphology of two dialects or related languages. For example, let's say two lects have similar non-identical morphological paradigms. For example, normal English "you goeth" vs archaic "you go"; or even taking the Wiktionary paradigms of various Turkic languages. Have you explored that at all?

b05102139 commented 1 year ago

@jhdeov For the data, there is a project called UniMorph that has collected all the wikitionary morphological paradigms, to which the method can be applied. I have personally not done this however. For me personally the question would be how to interpret the distances thus derived: for the phonetic data that comes with dialectR, groupings on dialect atlases and even the geographical distribution can serve as some form of evaluation. But for wikitionary data, given the paucity of closely-related language varieties, I think it might pose an issue. I am not a morphologist however, so: grain of salt!

b05102139 commented 1 year ago

@jhdeov One further thing to note would be non-concatenative morphology, where if you have something like vowel harmony or Semitic word formation patterns, an approach based on edit distance arguably cannot do justice to that. LSTM autoencoders have however been proposed in a paper by Coltekin et al. for the same task, but one also loses some of the nice, whitebox explainability of the current method.