DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Mentioned in etym #143

Closed ambs closed 3 years ago

ambs commented 3 years ago

Accordingly with the current schema, it looks like mentioned is not possible inside the etym. In https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#index.xml-body.1_div.6_div.1 it has the list of allowed elements, and mentioned is not one of them. Nevertheless, in the example below, mentioned is used in the example. This should be fixed.

I take the chance to ask what is the suggestion to replace mentioned when annotating a foreign word (origin). Thank you

ttasovac commented 3 years ago

Dear Alberto,

the section you are referring to is not showing a list of allowed elements in TEI Lex-0, but discussing what options exist in TEI Guidelines themselves.

I'm not seeing the example in which mentioned is used — it may have been swallowed up by GitHub's formatting. If you are by chance here referring to the example in https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#TEI.etym, i.e. in the Elements specification, this is by default taken over from TEI itself and has long been a source of annoyance for me because I know this is super confusing for the users. We should probably try to find a way to overwrite the examples in the elements specification with our own, but I haven't had time to do that — and it would be quite a lot of work to do it for each element we allow in TEI Lex-0.

Finally, as for the TEI Lex-0 recommendation for etymologies, a few of us have worked on that but the results are still only in a paper, have not been fully discussed with the wider group and have not been distilled for the Guidelines. But you can check out the paper here:

https://hal.inria.fr/hal-03108781

I will close the issue now, but feel free to reopen if you have further questions.

xlhrld commented 3 years ago

(I was going to close this, too. But anyway, I'll leave some further hints since I wrote this already.)

There is ongoing work in modeling the etymology section in TEI Lex-0. The main ticket in this regard is #26.

The general approach will be: use cit instead of the severely restricted mentioned. cit allows for a much more detailed representation of what's mentioned, including grammatical properties, definitions or quotations. The markup becomes a bit more complex, though, e.g. (minimally):

<cit type="etymon">
  <form>
    <orth>mentioned_word</orth>
    <!-- possibly variants -->
  </form>
  <!-- possibly grammatical properties via ./gramGrp -->
  <!-- possibly definitions via ./def -->
  <!-- … -->
</cit>
laurentromary commented 3 years ago

I have the feeling I would know at what time to start our TEI LEx 0 meetings ;-). But yes, the more structured representation mentioned by @xlhrld allows one to search etymological content precisely, which is what we need across varieties of lexical sources.

ambs commented 3 years ago

Thanks for the suggestions and pointers.

My main concern is that some words mentioned in etymology (word origins) are repeated across the dictionary a lot. So, it will not make any sense to have grammatical properties or definitions on it. Probably it can be seen more as a link (well, a broken link, probably) to that word in any other dictionary/resource (although this will not make the dictionary self contained, it will benefit by guaranteeing no duplicate information in the resource).