DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

abbr and expan are now allowed (in dev branch only) #176

Closed ttasovac closed 1 year ago

ttasovac commented 1 year ago

With 142f9f78cb870ce13d0a74171d0df3793626ac7b abbr and expan are now allowed in TEI Lex-0 dev-0.9.2. We have an obvious use case in the front matter of the Morais dictionary, and, really, pretty much any other dictionary out there: all kinds of abbreviations are usually listed before the "proper" dictionary content.

Remember, this is about representing the content of the dictionary front matter the way the original author(s) created it — it's not about taxonomies etc. which we deal with in the header, and which we can point to from those simple lists...

So far, so good. The thing is: once you allow abbr and expan in the core module, they will pop up in the content models of a bunch of dictionary-specific elements as well. And I don't like it.

For instance, while <usg type="domain"><abbr>Med.</abbr></usg> wouldn't be technically wrong because "Med." is an abbreviation, it would be lexicographically irrelevant and also superfluous considering that we can have @expand on usg to begin with, not to mention @norm and @value etc. And we've been recommending things like <gram type="pos">n.</gram> from the beginning, so what would be the purpose of having <gram type="pos"><abbr>n.</abbr></gram>? It would only make processing more difficult.

But I don't want to rush anywhere with this. And this would certainly require some additional discussion. I am leaving this issue open as a reminder to myself (and anybody else who may be interested) to think about working out a general strategy for allowing or not allowing abbr and expan in dictionary-specific elements...

abbr and expan are members of model.pPart.editorial. They get into the content model of dictionary-specific elements via model.phrase and macro.lexicalParaContent

  1. model.phrasemodel.pPart.editmodel.pPart.editorial and
  2. macro.lexicalParaContentmodel.phrasemodel.pPart.editmodel.pPart.editorial

Dictionary-specific elements with model.phrase in content model

Dictionary-specific elements with macro.lexicalParaContent in content model

anacastrosalgado commented 1 year ago

@ttasovac I was checking, and for cases such as "v. reciproco" or "v. ativo", I think we will need <gram type="pos"<abbr>xxx</abbr></gram>.

Instead of:

                  <gram type="pos" norm="VERB">
                     <abbr>V.</abbr>
                  <gram type="reflexivity">reciproco</gram>
                  <p>.</p>
               </item>

Even for the abbreviations list, could we have?

<gram type="pos" norm="VERB">v.</gram>
<gram type="reflexivity">reciproco</gram>
ttasovac commented 1 year ago

Not sure what you mean by xxx,but yes I would go for reflexivity as a gram type in those cases. That, however, we should discus in the issue raised by Jesse, I just didn't have time to deal with it this morning. I'm at the airport now...

ttasovac commented 1 year ago

But in abbreviation lists, I would not use gram at all - just list and item, and then abbr and expan.

anacastrosalgado commented 1 year ago

Sorry. The xxx was <gram type="pos"<abbr>xxx</abbr></gram>.

Originally, I have this:

<item>
                  <abbr type="POS" norm="verb">V.</abbr>
                  <subc>recipr.</subc>
                  <expan>Verbo reciproco</expan>
                  <p>.</p>
               </item>

If we will change subc, don't we need gram type?

BOA VIAGEM!

ttasovac commented 1 year ago

Ana, you're mixing up two issues. What I wrote about above is that reintroducing abbr and expan has an undesired consequence that they are now allowed inside gram and other dictionary-specific elements. But that has nothing to do with your simple lists of abbreviations. So forget about norm, subcategorization etc. You don't need any of that.

You want simple abbreviation and expansion; and you want to type your abbreviations (but that's because you want to do that for your dictionary, that's not required by TEI Lex-0.

<item>
  <abbr type="gram">V. recipr.</abbr>
  <expan>Verbo reciproco</expan>
  <pc>.</pc>
</item>
anacastrosalgado commented 1 year ago

Ok, understood. I will remove it. My fault.

xlhrld commented 1 year ago

The thing is: once you allow abbr and expan in the core module, they will pop up in the content models of a bunch of dictionary-specific elements as well. And I don't like it.

In principle, count me in on this, @ttasovac!

But … we should also take things like //choice/{sic,corr} into consideration when discussing //abbr. Obviously, entries may contain typos and other errors we would/could like to retain – especially in faithful digitizations of printed dictionaries. The modeling issue is essentially the same with abbr and sic: we would introduce a secondary annotation tier on top of the primary lexicographical annotation. That's always a hassle with inline mark-up.

ttasovac commented 1 year ago

Axel, you are alive! So happy to hear from you.

You are absolutely right, we have a need for correcting typos in the Portuguese dictionary we're working on as well. I'm knee-deep in Horizon Europe applications until March 9th, so I will only be able to come back to this after that... but I will definitely count on your help! :)

ttasovac commented 1 year ago

Ok, so here's what I did. As I explained above, model.phrase and macro.paraContent were the culprits for a great number of crazy elements that appear inside dictionary elements. abbr and expan were just tip of the iceberg: despite all of the customizations of TEI Lex-0, we were still inheriting from P5 affiliation, idno, email, all sorts of names etc. inside form, orth etc.

This has been bugging me for a long time. Dictionary elements need their own phrase-level class and a para-level macro, because dictionary elements are not your regular paragraphs. That's why I have created macro.lexicalParaContent and model.lexicalPhrase to reduce the abundance of stuff allowed by P5 inside dictionary elements.

One day, when we have more time, we can explore to what extent this would be worth discussing with the Council, but for now, we have a mechanism for making the content of models of dictionary elements lexicographically more appropriate without breaking anything in regular paragraphs, which we use in front matter etc.

I will close this issue for now. @xlhrld , we'll address choice sic and corr separately...