Open skalyan91 opened 3 years ago
Here's a note
You say: fill the table with the fields "Language", "Wordform", "Source language", and "Source wordform".
Yet some Wiktionary etymology paragraphs include also info on semantics of source form
e.g.
From Middle French, from Old French maisun, meson, inherited from Latin mānsiō, mānsiōnem (“abode, home, dwelling”), from maneō (“remain, stay”) (whence also French manoir).
In this partic. example I guess it can be ignored, because redundant with the linked entry; but in some cases the quoted meaning may help disambiguate the target sense: e.g.
specifies a certain meaning of Latin palma
From Middle English palme, from Old English palm, palma (“palm-tree, palm-branch”), from Latin palma (“palm-tree, palm-branch, palm of the hand”), from Proto-Indo-European pl̥h₂meh₂, plām- (“palm of the hand”). Cognate with Dutch palm, German Palme, Danish palme, Icelandic pálmur (“palm”). while
palm (hand)
words the semantics differently: From Middle English palme, paume, from Old French palme, paulme, paume (“palm of the hand, ball, tennis”), from Latin palma (“palm of the hand, hand-breadth”), from Proto-Indo-European palam-, plām- (“palm of the hand”). Cognate with Ancient Greek παλάμη (palámē, “palm of the hand”), Old English folm (“palm of the hand”), Old Irish lám (“hand”).
Should these be ignored altogether by the script? or incorporated in some way? (even if the reference gloss for a word is a separate column)
@Tavalam This depends on how we wish to treat homophony. I think the simplest and most easily defensible approach is to not treat homophony as different from polysemy. What this would mean in the case of English palm is the following:
I think this is all the information that we need; it is unnecessary to encode the information that Middle English palme (plant) comes from Old English, whereas Middle English palme (hand) comes from Old French, since the Old English etymon only has the “plant” sense, and the Old French etymon only has the “hand” sense. In other words, we can infer the fact that the Middle English word is homonymous and not polysemous, without having to manually encode this assumption anywhere.
I hope this makes some kind of sense.
Write a function that does the following:
<language name>
<ancestral form>
".<language name>
<ancestral form>
pairing, and use those as input for another function call. (I.e. use tail recursion.)Once we have a function that does the above, we can just loop it over all the words in NorthEuraLex.