LingSIG / wordAttributes

work space for a coherent proposal for inline attributes of <w> in TEI XML
1 stars 1 forks source link

@reg #4

Open TomazErjavec opened 7 years ago

TomazErjavec commented 7 years ago

@reg (for the normalized form in e.g. historical corpora)

I see number of problems with introducing this attribute:

So, I'd propose to sticking to <choice> with <orig> and <reg> and these then containing <w> etc. Yes, somewhat verbose, but covers all the cases (except for discontinuous elements, but that is a whole dimension of extra complication).

bansp commented 7 years ago

Thanks so much for this note, Tomaž -- it's been in our focus for the past few days, as we are streamlining the ticket, and it has helped us to make sure how we see the role of @reg among the other proposals.

Below is (more or less) my personal take:

This is still being debated, so I'm posting this as a partial and subjective reply. [This note was re-edited after a telco with Susanne, in which we discussed some fringe cases]

eduarddrenth commented 7 years ago

Can you please take a close look at https://bitbucket.org/teibestpractices/linguistic-customization which I think is very much related to this (but perhaps I miss the point here), it as well covers discontinuous words (solution copied from Tomaz). The soution was developed in close cooperation with linguists and is very usable.

bansp commented 7 years ago

This is just to signal that we believe we're done discussing these points amongst ourselves and this is now reflected in the ticket, in the section on @reg. We mostly agree with the key quote: this cannot be accomodated in this proposal

and our response is hopefully visible in the current form of the ticket (it wasn't so well visible in the earlier versions): the proposal is limited in scope. It is not meant to replace the <choice> mechanism, but rather to offer an alternative, wherever feasible.

All we want to say is that <choice> can be a pain, because of sub-word content. @reg removes much or all of that pain, in some contexts.

Privately, I suspect that <choice> also has limits, i.e., that examples can be found where one would have to push an entire sentence into the branches of <choice>, because of how word-order and lemmatization could interplay in contexts of split verbs and the like. The proper solution to that would be to resort to standoff annotation, but the proposal here is a way to provide for resources that are not encoded as standoff, and yet could benefit from a measure of linguistic markup encoded with a measure of order rather than by grabbing at whichever of @ana, @corresp or the meaningless @type appear to be unused in just the given resource.

bansp commented 7 years ago

Hi Eduard, I only now realised that you have posted the comment in this thread (I think I failed to refresh the page, and read your comment in the mail). I still owe you a reply in issue #6. If you meant to say that your proposal addresses @reg (the topic of this issue) then I admit that I fail to see how and would be grateful for some more in-depth explanation.

eduarddrenth commented 7 years ago

Thanks, in my response I do not address the @reg issue. Honestly I do not understand this discussion in depth. I am actually just letting you know of our way to add linguistic annotations to material in the hope that others may benefit from our efforts. Especially integrating TEI and universaldependencies could I think be of value. Keep up the good work!

TomazErjavec commented 7 years ago

@bansp thanks for the lenghtly and illumintaing rebuttal of my points - I can only say that I agree with you, indeed, simplicty by defintion cannot handle all complexity. But as long as you make it clear in the documentaiton what can and cannot be done and give clear guidelines on how to use the proposal (like that the teiHeader should document to which stream the annotation pertains to), it should be fine - so I'll close the issue.

@eduarddrenth thanks for sharing the bitbucket proposal. I looks nice, except for using @rendition, why not rather use @part, cf. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.fragmentable.html

eduarddrenth commented 7 years ago

Thanks, the @rendition on the homepage isn't part of what I propose, but thanks for the @part pointer.

When this feature request is accepted I will migrate to it where possible.

Bye

bansp commented 6 years ago

Hi Tomaž, I'll re-open this just so that we don't forget to make sure to address your initial points in further work.