Closed MedKhem closed 4 years ago
I think this is the way to go. I always thought it was strange to treat homographs with different parts of speech as different senses of a prototypical pos-less headword. Even though in some dictionaries they may appear like senses (i.e. they could be numbered etc.) I think this is a much cleaner way to model these.
I would go as far as saying that we should recommend that in all those cases when one entry contains mutltiple parts of speech (a la arrest-v. and arrest-n.), we should try to treat them as nested entries and not as senses.
I think it would be a bit arrogant to try and cut off part of the European lexicographic tradition due to some technical difficulties in a format which need not survive the coming ten years, so I trust that this is not the gist of the proposal here. After all, string identity is a pretty strong measure, and a justifiable choice for where POS identification has to allow a smaller or larger degree of arbitrariness (what's the POS of "near", please?). It is one thing to recommend choices of vocabulary (and their mapping to the local element/attribute choices) to lexicographers facing the task of retrodigitization, and something totally different to force them into micro- and macrostructural choices that they may not wish to make (or that they may not have the right to make). Cheers!
That's also related to #14 where the discussion is based on the Dutch achter example.
@bansp We shouldn't forget that our aim with Lex0 is primarily to provide a somewhat general baseline encoding that caters for the vast majority of lexical models. There will always remain cases where Lex0 will not suffice. In the case at hand, Lex0 still allows entry/sense/gramGrp
for the entry provided by @MedKhem. We're not going to »cut« that »off«. However, with recursive entry
, modelling this as entry/entry/gramGrp
also becomes possible. The question rather is whether to recommend the latter. To me too, arrest, noun and arrest, verb seem (and smell and taste) like individual entries.
And then again … »What's in a TEI name?«, anyway, remember? ;) If it looks like an entry, why not call it an entry?
@bansp who's "cutting off" parts of the European lexicographic tradition? I have no idea what you're talking about.
In #43 we made entry
member of sense.Part
so that we can have entry wherever re
used to be. And here we're talking here about recommending entry/entry/gramGrp
for homographic entries with different parts of speech as in arrest noun and arrest verb (as opposed toentry/sense/gramGrp
).
Anywho. This issue will remain open while until we finalize the stuff we started talking about in our last meeting in Berlin (collocs, MWEs, typology of entries) and then we'll see how all these mutually related issues work together.
Hi,
Not sure how I found myself on this list except for my interest in TEI Lex0. Not sure why it landed on this address either as it is not my lexicographical one.
I cannot but agree with Piotr. I have worked on dictionaries in many languages and from many different centuries, and nowhere in any lexicographical tradition has entry been a child of sense. There example given is the sort of thing that English learners’ dictionaries might play with, and would be easily handled as a pair of related entries under entry in the same way as in French I have
I cannot see how a change for surely technical reasons can be anything but detrimental and open that the lexicographical community, including retrodigitisers, would find reprehensible.
But I may be speaking out of turn.
Geoffrey
Le 14 févr. 2019 à 19:04, Piotr Banski notifications@github.com a écrit :
I think it would be a bit arrogant to try and cut off the part of European lexicographic tradition due to some technical difficulties in a format which need not survive the coming ten years, so I trust that this is not the gist of the proposal here. After all, string identity is a pretty strong measure, and a justifiable choice for where POS identification has to allow a smaller or larger degree of arbitrariness (what's the POS of "near", please?). It is one thing to recommend choices of vocabulary (and their mapping to the local element/attribute choices) to lexicographers facing the task of retrodigitization, and something totally different to force them into micro- and macrostructural choices that they may not wish to make (or that they may not have the right to make). Cheers!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DARIAH-ERIC/lexicalresources/issues/48#issuecomment-463729223, or mute the thread https://github.com/notifications/unsubscribe-auth/AP2xv0GqJRhwM5l3VaV2Tpm_25jqztotks5vNaVGgaJpZM4a7yyc.
Hi,
Not sure how I found myself on this list except for my interest in TEI Lex0. Not sure why it landed on this address either as it is not my lexicographical one.
I cannot but agree with Piotr. I have worked on dictionaries in many languages and from many different centuries, and nowhere in any lexicographical tradition has entry been a child of sense. There example given is the sort of thing that English learners’ dictionaries might play with, and would be easily handled as a pair of related entries under entry in the same way as in French I have
I cannot see how a change for surely technical reasons can be anything but detrimental and open that the lexicographical community, including retrodigitisers, would find reprehensible.
But I may be speaking out of turn.
Geoffrey
Le 14 févr. 2019 à 19:04, Piotr Banski <notifications@github.com mailto:notifications@github.com> a écrit :
I think it would be a bit arrogant to try and cut off the part of European lexicographic tradition due to some technical difficulties in a format which need not survive the coming ten years, so I trust that this is not the gist of the proposal here. After all, string identity is a pretty strong measure, and a justifiable choice for where POS identification has to allow a smaller or larger degree of arbitrariness (what's the POS of "near", please?). It is one thing to recommend choices of vocabulary (and their mapping to the local element/attribute choices) to lexicographers facing the task of retrodigitization, and something totally different to force them into micro- and macrostructural choices that they may not wish to make (or that they may not have the right to make). Cheers!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DARIAH-ERIC/lexicalresources/issues/48#issuecomment-463729223, or mute the thread https://github.com/notifications/unsubscribe-auth/AP2xv0GqJRhwM5l3VaV2Tpm_25jqztotks5vNaVGgaJpZM4a7yyc.
In Portuguese Academy Dictionary printed edition (2001), we have grammatical homonymous which are separated into different entries (see below). In the first encoding, we used
<group>
<gramGrp>adj.</gramGrp>
<sense revisto="14/03/2017" novo="05/07/2016">
<def>Relativo ou pertencente a Albânia (país da Europa).</def>
</sense>
</group>
<group>
<gramGrp>n. m., f.</gramGrp>
<sense revisto="14/03/2017" novo="14/03/2017">
<def>Natural, habitante ou cidadão da Albânia.</def>
</sense>
</group>
<group>
<gramGrp>n. m.</gramGrp>
<sense revisto="14/03/2017">
<def>Língua indo-europeia falada principalmente na Albânia.</def>
</sense>
</group>
Now, the cases of homonymy are encoded as follows:
Os jurados pronunciaram-se a favor de uma sentença capital para o criminoso.
Preocupação capital. Assunto de interesse capital.
<entry xml:id="DACL.CAPITAL:2" xml:lang="pt">
<form type="lemma">
<orth>capital</orth>
<lbl>:2</lbl>
<pron>kɐpitˈał</pron>
</form>
<gramGrp>
<pos>n.</pos>
<gen>f.</gen>
</gramGrp>
<sense xml:id="DACL.CAPITAL.6" n="1">
<def>Cidade onde está situada a sede administrativa de um país, província, região... </def>
<cit type="example">
<quote>Duarte, um mês depois, era preso, interrogado, e remetido para a capital, onde a identidade da pessoa foi de muitos reconhecida.</quote>
<bibl><author>CAMILO</author>, <title>As Três Irmãs</title>, <citedRange>151</citedRange></bibl>
</cit>
<entry xml:id="DACL.CAPITAL.8." xml:lang="pt">
<form><orth>+ de distrito.</orth></form>
</entry>
</sense>
<sense xml:id="DACL.CAPITAL.9" n="2">
<def>Cidade que constitui o centro de uma actividade.</def>
<cit type="example">
<quote>Diz-se que Paços de Ferreira é a capital do móvel.</quote>
</cit>
</sense>
<sense xml:id="DACL.CAPITAL.10" n="3">
<def>Letra maiúscula; letra maiúscula que inicia um capítulo.</def>
</sense>
<etym>
<seg type="desc">Do</seg>
<cit type="etymon"> <lang>la.</lang> <form><orth xml:lang="la">capitālis</orth></form>
</cit>
</etym>
</entry>
My goodness, I have managed to overlook e-mails with the replies, and from all the reactions I infer that I was entirely wrong thinking that I made my stance clear. Apologies for what must have seemed an incoherent message followed by silence. I'll say even more, and I do that cringing: upon re-reading Mohamed's message, and the entries that followed, I now fully understand that I was the sole cause of the misunderstanding, and I am now triply embarrassed. I am not even sure if I should "elucidate" what I meant, given that what I meant was based on a misconception for which I alone am to blame. Heartfelt apologies to all involved (and a promise to myself to stop thinking that I can procrastinate one job by "making a quick stab" at another). (OK, so just a quickie: I thought, wrongly, that you guys were pondering recommending an actual change of the macrostructure based on, let's call it, "lemmatization strategy" of the original author. Obviously, no one did that and I should have read the first two messages much more carefully, rather than focus on a single passage cut out of the whole.)
@WGBS2 Dear Geoffrey, I apologise for leading you astray with my message that I only now identified as fully incoherent (and therefore open to various interpretations, of which you chose one).
You point out an important thing that we were also alerted to by Katrien, repeatedly, namely that in our our more or less innocent modelling strategies, we should pay very close attention to the feelings of born and bred lexicographers, maybe not necessarily bending some modelling decisions, but certainly very clearly explaining the difference between well-established lexicographic vocabulary on the one hand, and the vocabulary of a very restricted set of TEI-XML modelling choices on the other. What is well established as a concept in one realm (e.g. "entry") need not correspond one-to-one to the name-of-an-element in the other realm. In other words, the TEI XML "entry" is an element name that only in a subset of cases corresponds directly to the lexicographic concept of the entry. Thank you for reminding us of the need to be very careful here.
As for the e-mail address that this goes to, it must be the address associated with your GitHub account, and adding your account to this group was Toma's only way of including you in the GitHub environment. You might want to either change the address associated with the "WGBS2" account, or create another account with a different e-mail address, and then let Toma know about it. (The latter might be a suboptimal choice, however, and cumbersome in the long run.) Or you might want to filter messages from GitHub based on the "notifications@github.com" sender.
thank you @bansp for clarifying the misunderstanding.
@anacastrosalgado thank you for sharing your examples. The first one seems to further illustrate the issue raised here where \<group> has been used to play the role of \<entry>, as \<entry> had been not yet made recursive.
For the second example, I'm not sure if it's related to the case of homographs we are discussing here. Maybe you could develop on this? :)
This is now well-documented. See https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#nested-entries-vs-multiple-senses
In the example below, modelling the entry with 2 senses, differentiated by POS, we'll lead us to the same issue as in #43 where we need an \<entry> inside \<sense> which is not a valid TEI option and looks a bit weird to my eyes (and Toma's).
A possible modelling would be enabling nested entries and consider the construct \<gramGrp> and \<sense> as an \<entry>:
This way, both of the homographs are treated equally and entries and senses (in dictionaries where lexicographers represent homographs as separate articles) could be easily mapped to constructs from the same category.
What do you think about this?