Open TomazErjavec opened 7 years ago
Thanks so much for this note, Tomaž -- it's been in our focus for the past few days, as we are streamlining the ticket, and it has helped us to make sure how we see the role of @reg
among the other proposals.
Below is (more or less) my personal take:
@reg
is just as bad or just as good as @lemma
: both break some implied TEI principles and are clumsy as XML objects to which @xml:lang
is not able to apply (however, luckily, where they are used/proposed, the xml:lang information is conveniently redundant).@reg
and @lemma
are utility devices, to be used when convenient -- and we are just arguing that in many cases, @reg
as an attribute is simply a quick and convenient means to an end.@reg
(which is tei:text), by analogy to multi-word lemmas (unless those multi-word @reg
s are then expected to be subject to further analysis, but this is what we see as the border of the convenience system; there are many "if"s here, but these "if"s can be turned into an algorithm, we think, and some paths through this algorithm are nice and flowery, some can only be tackled with choice/reg
, and for some, neither solution is particularly helpful).@reg
can take you through both paths).@reg
system -- @reg
is not supposed or expected to embrace themThis is still being debated, so I'm posting this as a partial and subjective reply. [This note was re-edited after a telco with Susanne, in which we discussed some fringe cases]
Can you please take a close look at https://bitbucket.org/teibestpractices/linguistic-customization which I think is very much related to this (but perhaps I miss the point here), it as well covers discontinuous words (solution copied from Tomaz). The soution was developed in close cooperation with linguists and is very usable.
This is just to signal that we believe we're done discussing these points amongst ourselves and this is now reflected in the ticket, in the section on @reg
. We mostly agree with the key quote:
this cannot be accomodated in this proposal
and our response is hopefully visible in the current form of the ticket (it wasn't so well visible in the earlier versions): the proposal is limited in scope. It is not meant to replace the <choice>
mechanism, but rather to offer an alternative, wherever feasible.
All we want to say is that <choice>
can be a pain, because of sub-word content. @reg
removes much or all of that pain, in some contexts.
Privately, I suspect that <choice>
also has limits, i.e., that examples can be found where one would have to push an entire sentence into the branches of <choice>
, because of how word-order and lemmatization could interplay in contexts of split verbs and the like. The proper solution to that would be to resort to standoff annotation, but the proposal here is a way to provide for resources that are not encoded as standoff, and yet could benefit from a measure of linguistic markup encoded with a measure of order rather than by grabbing at whichever of @ana
, @corresp
or the meaningless @type
appear to be unused in just the given resource.
Hi Eduard, I only now realised that you have posted the comment in this thread (I think I failed to refresh the page, and read your comment in the mail). I still owe you a reply in issue #6.
If you meant to say that your proposal addresses @reg
(the topic of this issue) then I admit that I fail to see how and would be grateful for some more in-depth explanation.
Thanks, in my response I do not address the @reg issue. Honestly I do not understand this discussion in depth. I am actually just letting you know of our way to add linguistic annotations to material in the hope that others may benefit from our efforts. Especially integrating TEI and universaldependencies could I think be of value. Keep up the good work!
@bansp thanks for the lenghtly and illumintaing rebuttal of my points - I can only say that I agree with you, indeed, simplicty by defintion cannot handle all complexity. But as long as you make it clear in the documentaiton what can and cannot be done and give clear guidelines on how to use the proposal (like that the teiHeader should document to which stream the annotation pertains to), it should be fine - so I'll close the issue.
@eduarddrenth thanks for sharing the bitbucket proposal. I looks nice, except for using @rendition
, why not rather use @part
, cf. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.fragmentable.html
Thanks, the @rendition on the homepage isn't part of what I propose, but thanks for the @part pointer.
When this feature request is accepted I will migrate to it where possible.
Bye
Hi Tomaž, I'll re-open this just so that we don't forget to make sure to address your initial points in further work.
I see number of problems with introducing this attribute:
@reg
will seldom use<g>
, it seems bad practice to break a convention to accomodate one case only.@reg
.So, I'd propose to sticking to
<choice>
with<orig>
and<reg>
and these then containing<w>
etc. Yes, somewhat verbose, but covers all the cases (except for discontinuous elements, but that is a whole dimension of extra complication).