TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

`<ab>` should be able to nest #1856

Closed martindholmes closed 2 years ago

martindholmes commented 5 years ago

The <ab> element "(anonymous block) contains any arbitrary component-level unit of text, acting as an anonymous container for phrase or inter level elements analogous to, but without the semantic baggage of, a paragraph." In other words, it's a block-level element with no semantics.

Many block-level elements (such as <div>) can nest, and this makes perfect sense; chapters contain sections contain subsections. One that cannot nest is <p>, and this is because of its semantics; we claim that the thing called a paragraph is a specific kind of thing that cannot directly contain other instances of itself.

<ab> is not a paragraph; if you're tagging a paragraph, you should use <p>. <ab> explicitly has no semantics other than its block-ness. So why can't it nest? Its analogue at the inline level, <seg>, can nest. Imagine you're tagging a text consisting of a sequence of emojis that you don't understand, but they're clearly arranged in nested blocks, which you can tell because of indenting. You can't use <div> because that's making semantic claims about the nature of the blocks; you would surely want to use <ab>, and have it nest.

This ticket arises out of a Stylesheets ticket (https://github.com/TEIC/Stylesheets/issues/328) where we're trying to make the TEI Lite output of odd2lite.xsl actually valid. That output contains lots of nested <ab>s; rather than try to rewrite the process to avoid <ab>s, which would be quite difficult, it seems to me that we could solve that particular problem by letting <ab>s nest, which they should be able to do anyway.

bansp commented 5 years ago

One argument against this that I can think of is the sort of argument that I resent and that has to do with putting artificial metaconstraints on the encoder for fear that the encoder "create a mess" if too much is allowed (I think that arguments of that sort are intellectually fake). Another potential counterargument is that you use <seg> for subdivisions of <ab>, but that in turn could in many cases mean abuse of <seg> only because nested <ab> is not available. In other words, +1 from me.

lb42 commented 5 years ago

It's debatable whether or not <ab> is analogous to <seg>: it might just as plausibly be argued that the right phrase-level analogue is <s> which doesn't self-nest. But the argument from analogy is a little dubious anyway. I don't understand why you wouldn't use <div> here if you want a self-nesting structure. The whole point of <div> is that it does NOT make any "semantic claims"! Wouldn't this address the problem raised by https://github.com/TEIC/Stylesheets/issues/328 equally well?

duncdrum commented 5 years ago

just talking from my own use cases here, i favour nesting <ab>. What a paragraph is isn't always clear cut in languages with continuous scripts and without sentence markers, which is why i prefer <ab> over <p>. Being able to nest them would make sense. <seg> and <s> are fun tags for sure, but i don't quite see what their semantics has to do with <ab>

martindholmes commented 5 years ago

@lb42 Using <div> isn't practical because most of these instances occur inside table cells. And <div> has very clear semantics:

<div> (text division) contains a subdivision of the front, body, or back of a text.

<ab> and <seg> are both introduced by the linking module, and their definitions both eschew semantics -- <seg> is "(arbitrary segment) represents any segmentation of text below the β€˜chunk’ level", whatever the chunk level might be (but arguably anonymous block).

lb42 commented 5 years ago

Ah, I missed the point that these are instances within table cells. But in that case, what's wrong with using <seg> ? I suppose my concern about permitting nesting of <ab> is that this kind of application was not envisaged for this element. Its purpose was to have a way of marking up things that had they been definitely prose or verse would have been marked as <p> or <l> : e.g. verses of the bible, or passages in early printed plays where deciding beween prose and verse is problematic.
<s> also -- in your lovely phrase -- "eschews semantics" : it's for any non-nesting end to end segmentation.

martindholmes commented 5 years ago

@lb42 In the specific case of odd2lite.xsl, we can't use <seg> because these are definitely blocks. And I don't think <s> is free of semantics:

(s-unit) contains a sentence-like division of a text.

"Sentence-like" (along with the element name itself, which I believe is taken from the word) seems pretty loaded to me. <s> is not for typographical blocks.

The point really is that odd2lite.xsl is using the Lite output as a sort of rendered output (precursor to HTML etc.), so Sebastian was really thinking typographically when he wrote it.

lb42 commented 5 years ago

Well, I am not sure I agree that <seg> is inappropriate here. And I still think Sebastian was wrong to use <ab> in this way. Better either to use <seg type="typographicBlock"> or invent a new <block> element.

martindholmes commented 5 years ago

@lb42 I just don't understand why you would want to create a new element. This seems to me to be pretty much what anonymous block is for. Let's say you're encoding this:

πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜€πŸ˜‰
πŸ˜‰πŸ˜‰πŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒ
πŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜ƒπŸ˜„πŸ˜„πŸ˜„πŸ˜„πŸ˜„πŸ˜„
    πŸ˜„πŸ˜„πŸ˜„πŸ˜„πŸ˜‰πŸ˜‰
    😁😁😁😁😁😁
    😁😁😁😁😁😁
πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜‰
πŸ˜†πŸ˜‰πŸ˜…πŸ˜…πŸ˜…πŸ˜…πŸ˜…πŸ˜…πŸ˜…πŸ˜…
πŸ˜…πŸ˜‰πŸ˜…πŸ˜‰πŸ€£πŸ€£πŸ€£πŸ€£πŸ€£πŸ€£
πŸ€£πŸ€£πŸ˜‰πŸ˜‰πŸ˜‰πŸ˜‰πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚
πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‰πŸ˜‚πŸ™‚πŸ™‚
πŸ™‚πŸ™‚πŸ™‚πŸ™‚πŸ™‚πŸ™‚πŸ™‚πŸ˜‰πŸ˜‰πŸ˜‰
πŸ™ƒπŸ™ƒπŸ™ƒπŸ™ƒπŸ™ƒπŸ™ƒπŸ™ƒπŸ™ƒπŸ˜‰πŸ˜‰

I'd say you have two blocks, one nested inside the other, but you know nothing about the content so you can't claim they're divisions. They're just blocks. And <seg> is not a block; it's below the "chunk" level.

lb42 commented 5 years ago

Why two blocks rather than three? or 13? If you know nothing about the content, how can you plausibly claim to know that there is a smaller block embedded within a larger outer one -- rather than, say, a large block, followed by a small one, followed by a third large one?

This is why (dare I suggest) descriptive markup shouldn't be used to capture formatting artefacts...

martindholmes commented 5 years ago

@lb42 It's not descriptive markup though. The only thing it says is "this is a block". If I decided that there were thirteen blocks, some nested inside others, I'd still need to nest my <ab>s.

Encoders draw conclusions about the nature of things based on their formatting all the time. When the only conclusion you can draw is that something is a block, you use <ab>. And if you draw the conclusion that there are nested blocks, you should used nested <ab>s.

Why do you think <ab>s shouldn't nest? I don't see any basis for claiming that, because they're not anything which has any properties that would prevent nesting. You seem to feel that <ab>s have some semantics that would prevent it, like <p>s do, but they're explicitly not <p>s, by definition.

duncdrum commented 5 years ago

Isn’t the question rather how martin can adequately express his linguistic knowledge that the given text consists of three nested elements, none of which is a paragraph or div? Just for fun let’s play find the div

@lb42 so if we re not using markup for formatting artifacts what about marginalia or footnotes, seems that the formatting here is kind of important?

lb42 commented 5 years ago

The description for <ab> reads ">contains any arbitrary component-level unit of text, acting as an anonymous container for phrase or inter level elements analogous to, but without the semantic baggage of, a paragraph."

The issue here is what exactly does "analogous to" mean and indeed what counts as "semantic baggage". I think my objection is that I don't consider the ability to self nest (or not) to be part of the semantic baggage, and I do think that "analogous to" implies that there should be consistency in the structural properties of the elements concerned. These are both value judgments, obviously, and I don't expect them to be universally shared. But that's why I feel uneasy at using <ab> in this way, quite aside from the procedural vs descriptive markup issues.

martindholmes commented 5 years ago

I think you're arguing on my side. It's precisely the semantic baggage of <p> that makes it non-nesting. And I think Duncan's documents make a very convincing case for the existence of nesting blocks that aren't divs or paragraphs (which are pretty Eurocentric notions anyway). A markup language shouldn't reflect what you think documents ought to be like; it should enable you to express what they actually turn out to be like.

larkvi commented 5 years ago

If there is going to be an element called 'arbitrary block,' I think that it should do what it says on the tin. Otherwise, the name is teaching the user something untrue about its use. I would favour making <ab> truly arbitrary.

Also, with semantic enrichment, it seems to me that there would seem to be a lot of content-markup which is not clearly related to / parallel to any of the typesetting-derived block elements, which might be minimally modelled as <ab> and a rdf target or an xml:id as stand-off markup?

hcayless commented 5 years ago

Some previous discussion: http://tei-l.970651.n3.nabble.com/lt-ab-gt-within-lt-ab-gt-td4026246.html

martindholmes commented 5 years ago

I'm going to try to summarize Council's lengthy discussion today so it's not necessary to go through it all again when the ticket is addressed anew. I'll try to be as objective as possible even though I have a strong opinion.

I think we fall into three camps. Camp 1 believes absolutely that <ab> should never nest, because although it's not a <p>, it shares with <p> the fact that it's a /chunk/, as defined in Chapter 1 of the Guidelines:

https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html#STBTC

"...which can appear directly within texts or within divisions of them, but not (usually) within other chunks."

In addition, Camp 1 claims that making nestable will cause lots of problems for people who have already written processing code that relies on the fact that it's not nestable. Camp 1 thinks that if a nestable semantics-free block is required, then either people should use <seg>, or a new element should be created.

Camp 2 looks at the definition of "chunk" above and points out that the word "usually" is right there in the definition; that other chunks such as <listBibl> and <quote> already nest; and that the processing implications are not nearly as bad as Camp 1 believes. They also point out that TEI's own Stylesheets produce nested <ab>s on the way to the HTML Guidelines, and it would be mildly preferable if even intermediate TEI forms created during processing were also valid. Camp 2 is mildly in favour of making <ab> nestable, but not strongly enough to pick a fight with Camp 1. They would also be OK with creating a new element, but it's very difficult to think of an element name which doesn't cause confusion alongside <ab>; it would have to have a name that was somehow more specific, but its behaviour is actually looser, so users would be puzzled.

Camp 3 believes that it is the very semantics of <p> that prevent it from nesting, so if <ab> doesn't carry those semantics, then it should be able to nest; that in any case the word "usually" is there for a reason; and that this same request has come up several times before, in multiple contexts with different types of encoding and text. They argue that making <ab> nestable has no meaningful side-effects unless you choose to use it; that processors handling <ab> are in any case handling it in a very generic way that would probably not break if it were nestable; and that any new element would only lead to confusion because you would be arbitrarily choosing between two elements that are (necessarily) quite vaguely described and seem to have everything in common except this one weird behavioural quirk.

For @joeytakeda and the LEMDO project, it seems unlikely that this will be resolved any time soon, so the best advice is either to switch to using <seg>, or to continue with the current customization that does allow <ab> to nest. It might be worth approaching the membership through TEI-L to see how much support for nestable <ab>s there is, to strengthen the argument a little for the next time it's discussed.

duncdrum commented 5 years ago

camp 3

jamescummings commented 5 years ago

Note that for 'camp 1' the benefit of a new element for those who don't want to use the already nestable <seg> (which is appropriate if something is nested inside a chunk-level element) is that if we allow <ab> to nest we are also removing this restriction which is a useful thing that people may have relied upon.

peterrobinson commented 5 years ago

A lot of shadowboxing going on here. I find it very hard to imagine a real-world scenario where one actually wants to nest elements. And I have been using them a very long time, if that counts for anything. The example Martin gives is not convincing, absent a firm statement that there is something here which cannot be expressed adequately except by nesting elements. What is that something? It is rather telling that this request arose from a problem with processing ODD documents, and not to do with some fundamental semantic issue. As Martin Holmes stated, even in this case the problem could be resolved without nesting elements, it would just be difficult.

I am not against doing things for processing convenience (do it all the time, myself..). But one expects at least a gesture towards a reason beyond "it would make it easier for me". And thinking about it: the various collation systems I've worked with over the years actually assume that there are three elements which do not nest, and which do not contain higher-level nesting objects. Accordingly, the content of these elements may be passed to a collation system. These three elements are <l> <p> <ab>. There seems to me to be a nice symmetry about these three, which would be broken if is allowed to nest.

So I am definitely in Camp 1. 1+. It may be that there are real arguments for to be nestable, beyond "it would make things easier for me". Let us hear those arguments.

martindholmes commented 5 years ago

The use-case is a project encoding primary sources for early modern drama, where the encoders are encountering many nested block-type things which they are reluctant to identify specifically (as e.g. line, line-group, or paragraph).

I think they would like to be able to distinguish (nestable) block-like things from nestable inline-like things, and assumed that would be the appropriate element for that (contrasting with ).

Cheers, Martin

On 2019-09-15 5:35 p.m., Peter Robinson wrote:

A lot of shadowboxing going on here. I find it very hard to imagine a real-world scenario where one actually wants to nest elements. And I have been using them a very long time, if that counts for anything. The example Martin gives is not convincing, absent a firm statement that there is something here which cannot be expressed adequately except by nesting elements. What is that something? It is rather telling that this request arose from a problem with processing ODD documents, and not to do with some fundamental semantic issue. As Martin Holmes stated, even in this case the problem could be resolved without nesting elements, it would just be difficult.

I am not against doing things for processing convenience (do it all the time, myself..). But one expects at least a gesture towards a reason beyond "it would make it easier for me". And thinking about it: the various collation systems I've worked with over the years actually assume that there are three elements which do not nest, and which do not contain higher-level nesting objects. Accordingly, the content of these elements may be passed to a collation system. These three elements are

. There seems to me to be a nice symmetry about these three, which would be broken if is allowed to nest.

So I am definitely in Camp 1. 1+. It may be that there are real arguments for to be nestable, beyond "it would make things easier for me". Let us hear those arguments.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1856?email_source=notifications&email_token=AASNASIIGJS7LYGZFRMFZ2LQJZI3JA5CNFSM4GYEEKV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XTFKY#issuecomment-531575467, or mute the thread https://github.com/notifications/unsubscribe-auth/AASNASLNLK35KI7VFOK2EA3QJZI3JANCNFSM4GYEEKVQ.

peterrobinson commented 5 years ago

I don’t find this persuasive, either. There is a big difference between saying β€œI see this thing and it is a nested ” and β€œI see things nested and I don’t know what they are. I’ll code them as elements and figure out what they really are later”.

IN the use case Martin gives here, one could use perfectly adequately. If you wanted to distinguish some as block-like (in the practical sense of being marked by a carriage return etc before or after) then add a rend attribute . I understand that there is an appealing simplicity in using for those chunks which have a carriage return, and see for those that don’t. Against that, the point of is precisely that it specifies a block which may, or may not, have carriage returns about them (as with and

). One might become ethically disturbed at this restricted use of . But oh well.

In any event β€” my vote stays with Camp 1 here. My thought is that this scenario β€” I have lots of things here, some with carriage returns about them, some not, some which look to be nested etc, β€” looks tailor-made for the anonymity of . Again, using this way loses the analogy with and

. I am presuming that when the encoders do get to decide what is nesting etc, they will express this through the usual hierarchies of (for example) div-p or div-p-seg or div-ab or div-lg-l or div-lg-l-seg etc etc. That is, the finished encoding will NOT have elements nesting. The s are only nesting at a preliminary stage. So why not let do this work all the way through, and leave alone?

P

On Sep 15, 2019, at 9:51 AM, Martin Holmes notifications@github.com<mailto:notifications@github.com> wrote:

The use-case is a project encoding primary sources for early modern drama, where the encoders are encountering many nested block-type things which they are reluctant to identify specifically (as e.g. line, line-group, or paragraph).

I think they would like to be able to distinguish (nestable) block-like things from nestable inline-like things, and assumed that would be the appropriate element for that (contrasting with ).

Cheers, Martin

On 2019-09-15 5:35 p.m., Peter Robinson wrote:

A lot of shadowboxing going on here. I find it very hard to imagine a real-world scenario where one actually wants to nest elements. And I have been using them a very long time, if that counts for anything. The example Martin gives is not convincing, absent a firm statement that there is something here which cannot be expressed adequately except by nesting elements. What is that something? It is rather telling that this request arose from a problem with processing ODD documents, and not to do with some fundamental semantic issue. As Martin Holmes stated, even in this case the problem could be resolved without nesting elements, it would just be difficult.

I am not against doing things for processing convenience (do it all the time, myself..). But one expects at least a gesture towards a reason beyond "it would make it easier for me". And thinking about it: the various collation systems I've worked with over the years actually assume that there are three elements which do not nest, and which do not contain higher-level nesting objects. Accordingly, the content of these elements may be passed to a collation system. These three elements are

. There seems to me to be a nice symmetry about these three, which would be broken if is allowed to nest.

So I am definitely in Camp 1. 1+. It may be that there are real arguments for to be nestable, beyond "it would make things easier for me". Let us hear those arguments.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1856?email_source=notifications&email_token=AASNASIIGJS7LYGZFRMFZ2LQJZI3JA5CNFSM4GYEEKV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XTFKY#issuecomment-531575467, or mute the thread https://github.com/notifications/unsubscribe-auth/AASNASLNLK35KI7VFOK2EA3QJZI3JANCNFSM4GYEEKVQ.

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/TEIC/TEI/issues/1856?email_source=notifications&email_token=AAJ5EG2UA2PD53QHFD6EZZDQJZKZ5A5CNFSM4GYEEKV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XTPZY#issuecomment-531576807, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAJ5EG4RWKVOFJ3BNHXKBI3QJZKZ5ANCNFSM4GYEEKVQ.

joeytakeda commented 5 years ago

Thanks @martindholmes for the summary of council's conversation about the issue! And thanks to council and the rest of the commenters for this lively and challenging discussion!

For some background: I'm working with Martin on the TEI Lite ticket and I am the programmer for the EM Drama project he cites above, so I can provide a bit more detail as to why the drama project uses nested <ab>.

We are encoding editions of early modern drama, which include "Old Spelling" versions and "Modernized" versions: the critical difference here is that the old spelling editions are necessarily agnostic regarding whether the contents of a speech are prose or verse. (Note that we are tagging speeches as such.) That distinction between prose/verse is made in the modern edition by an expert on the text since the identification of prose/verse is a critical action that requires a whole suite of research, expertise et cetera, which is better suited for the modernized version of the text. (Which is precisely the type of documented use-case for <ab> in the Guidelines: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#index-egXML-d53e123618.)

Following this logic, we do not, then, tag what some might argue are quotations, songs, et cetera that often appear within the text. We can ascertain from the bibliographic codes that something is meant to go within (identation, et cetera), but we cannot ascertain why it goes within. So in cases where a song or a quotation might be embedded in a speech, which continues after the embedded thing, we still want to signal that we are making the judgement that something is nested within something else, but we do not want to make the claim what that thing is.

Let's take the "All the glisters..." verse from Merchant of Venice as an example:

MVF1-1856

One way to tag the that speech might be to understand it (as perhaps a modern editor would) as a piece of prose with a verse quotation embedded within it:

<sp>
               <speaker>Mor.</speaker>
               <p>O hell! what haue we here, a carrion death,
                  <lb/>Within who<g ref="g:longS">ΕΏ</g>e emptie eye there is a written <g ref="g:longS">ΕΏ</g>croule;
                  <lb/>Ile reade the writing.
                  <quote>
                     <lg>
                        <l>All that gli<g ref="g:longS">ΕΏ</g>ters is not gold,</l>
                        <!--More from the quoted verse here-->
                     </lg>
                  </quote>
               </p>
         </sp>

Or, if we thought (for whatever reason) the contents of the speech is actually verse and not prose (or verse inadvertently set like prose), we could encode the contents like so:

 <lg>
               <l>O hell! what haue we here, a carrion death,</l>
               <l>Within who<g ref="g:longS">ΕΏ</g>e emptie eye there is a written <g ref="g:longS">ΕΏ</g>croule;</l>
               <l>Ile reade the writing.</l>
               <lg type="embeddedVerse">
                  <l>All that gli<g ref="g:longS">ΕΏ</g>ters is not gold,</l>
                  <!--More here-->
               </lg>
            </lg>

In any case, we are able to say this block exists within this block in perfectly valid TEI.

However, our attempt to generalize this markup to follow the scholarly protocols set by the editorial team, however, fails, since we cannot nest <ab>:

<sp>
               <speaker>Mor.</speaker>
               <ab>O hell! what haue we here, a carrion death,
                  <lb/>Within who<g ref="g:longS">ΕΏ</g>e emptie eye there is a written <g ref="g:longS">ΕΏ</g>croule;
                  <lb/>Ile reade the writing.
                  <ab>
                        <lb/>All that gli<g ref="g:longS">ΕΏ</g>ters is not gold,
                        <!--More from the quoted verse here-->
                  </ab>
               </ab>
         </sp>

In other words, the claims we want to make here are precisely about the blockness of some textual feature we encounter. As suggested above, we can certainly do something like <seg type="block"> or <seg style="display:block;">, but it feels wrong when we have an element whose name suggests that it is a block without semantic baggage.

(As an aside, @duncdrum's earlier comment provides another good example as to why the anonymity of anonymous block is necessary for a wide range of texts: https://github.com/TEIC/TEI/issues/1856#issuecomment-464855158)

ebeshero commented 4 years ago

Council asks @martinascholger to share her example where this would have been appropriate for her project encoding.

JanelleJenstad commented 2 years ago

@hcayless posted this message to TEI-L

I'd like your advice on a piece of markup. The document in question is a bilingual Arabic / Ancient Greek protocol (sort of a cover sheet for a bunch of legal documents). It looks like this: https://papyri.info/ddbdp/cpr;3;38

To give you a better idea of what's going on if you don't read Greek and/or Arabic:

(ar) In the name of God, the compassionate, the merciful (grc) In the name of God the compassionate, the benevolent There is no God but God alone. Muhammad is the messenger of God (ar) There is no God but Allah alone with no partner He has not begotten, neither was he begotten and he has none that equal him [ gap ] (grc) ... to the true faith (ar) Muhammad is the Messenger of God, He has sent him with guidance and true religion Abdullah Al-Walid Commander of the Faithful (grc) Abdella Aloulid Emir Almoumnin (Greek transliteration of the Arabic) Abdella, son of Abdelmalik, counselor(?) (ar) This is what the Emir Abdullah bin Abd al-Malik ordered. In the year eighty-nine

My question is this: currently this whole thing is wrapped in an <ab>. Would it be appropriate to wrap each individual block of Greek / Arabic in its own <ab>? Like so:

<ab xml:lang="ar"><lb n="1"/>In the name of God, the compassionate, the merciful</ab> <ab xml:lang="grc"><lb n="2"/>In the name of God the compassionate, the benevolent <lb n="3"/>There is no God but God alone. Mohammed is the messenger of God</ab> ...

and so on. As an argument in favor, I'd say these are certainly visually distinct blocks of text. Against, that the Arabic and Greek are both running texts, interleaved, and so treating them as blocks is maybe torturing the text a little too much. We could use @next and @prev to connect them, but I'm not sure I'd bother.

The advantage of using <ab>s is that formatting the LTR and RTL texts properly becomes fairly trivial, whereas now it's quite painful.

JanelleJenstad commented 2 years ago

And @martindholmes replied: I'd go for nested ab elements for this. Each language text-run is clearly a block from the layout point of view, but they don't fit into the conventional block categories (div, p, etc.), so I think this is precisely what ab is for. We should allow nested abs for this sort of scenario, and we have a couple of customizations for projects that need them already.

lb42 commented 2 years ago

I remain unconvinced. The only cogent argument I see here against using for the individual language-chunks seems to be that you want them formatted like blocks (though you also want to show that they are linked). This is not persuasive: you could easily indicate the blockishness of the source in any number of different ways. And some of them, I think, are actually not formatted as blocks anyway.

lb42 commented 2 years ago

To address @joeytakeda 's example above: I would tag this as two consecutive <ab> elements, if I really dont want to assert that the first one is a <p> and the second a <quote> They are both wrapped in a <sp> so no need for an extra <ab> to wrap them together.

<sp>
               <speaker>Mor.</speaker>
               <ab>O hell! what haue we here, a carrion death,
                  <lb/>Within who<g ref="g:longS">ΕΏ</g>e emptie eye there is a written <g ref="g:longS">ΕΏ</g>croule;
                  <lb/>Ile reade the writing.
                  </ab>
<ab>
                        <lb/>All that gli<g ref="g:longS">ΕΏ</g>ters is not gold,
                        <!--More from the quoted verse here-->
                  </ab>
         </sp>
martindholmes commented 2 years ago

@lb42 It's obviously possible to find awkward workarounds for every individual case, but there's no logical reason why <ab>s shouldn't nest just like <div>s can. By definition they have no semantics, so there's no justification for precluding their nesting. If I say something in my text is an anonymous block, that's my judgement, and if I perceive that such blocks are in fact nesting in my text, then I should be able to express that in my code. You're at liberty to see something different when you look at my text, but that shouldn't stop me from encoding my analysis, surely? It's nothing but a block, and many blocks can nest unless they have explicit semantics that prevent it; there are no semantics here, so what's the problem?

peterstadler commented 2 years ago

@lb42 It's obviously possible to find awkward workarounds for every individual case, but there's no logical reason why <ab>s shouldn't nest just like <div>s can. By definition they have no semantics, so there's no justification for precluding their nesting. If I say something in my text is an anonymous block, that's my judgement, and if I perceive that such blocks are in fact nesting in my text, then I should be able to express that in my code. You're at liberty to see something different when you look at my text, but that shouldn't stop me from encoding my analysis, surely? It's nothing but a block, and many blocks can nest unless they have explicit semantics that prevent it; there are no semantics here, so what's the problem?

@martindholmes you convinced me, for what it's worth πŸ‘

jamescummings commented 2 years ago

I'm still in camp 1. I think re-purposing the existing element does more harm than good.

sydb commented 2 years ago

I find myself on the fence on this one. But an analogy I just drew up pushes me more towards allowing <ab> to self-nest, rather than inventing a new β€œsmaller-than-<div>, larger-than-phrase level” element that can self-nest, leaving <ab> as it is.

An <s> is, in some sense, a specialized <seg>, i.e. a <seg type="sentence">; the main syntactic difference is that an <s> cannot self-nest. In the same way, why isn’t a <p> a specialized <ab>, i.e. an <ab type="paragraph">, whose only syntactic difference is that a <p> cannot self-nest?

Given that I think of <seg> as a generic phrase-level element of which there are many specialized derivatives[1] like <w>, <s>, <ex>, and <c>, and I think of <div> as a generic division-level element of which there are a few specialized derivatives like <div1>, <divGen>, <lg>, and <epilogue>, then why not think of <ab> as a generic mid-level element with a bunch of specialized derivatives like <l>, <p>, <camera>, and <argument>?


[1] I am using the term β€œderivatives” very loosely here, because the specialized elements I am listing are not actually derived from the generic ones in the CS sense of inheritance. I bet @laurentromary would say they should be, though. πŸ™‚

martindholmes commented 2 years ago

@jamescummings How are we repurposing it? It's an anonymous block. Its purpose is to be used when you have a block that doesn't fit into any of the existing categories with semantics.

lb42 commented 2 years ago

Actually and to my surprise, I find myself agreeing with Syd on the feeling that s is to seg as p is to ab. But then of course both Syd and I are constitutionally incapable of separating syntactic and semantic properties, which is why we have no problem representing this (purely) syntactic difference by different elements, and thereby asserting they have different semantics. OTOH I share James's disquiet at the definitely Birnbaum-infringing nature of this proposed change. At present p and ab constitute the model.pLike class, which is used in a gazillion content models. I'll bet that some weirdnesses will ensue if we leave the new proposed self-nesting ab in that class.

JanelleJenstad commented 2 years ago

What would be the Birnbaum-fallout if we moved ab to its own model.abLike class? There are plenty of elements that are the only one in their class.

lb42 commented 2 years ago

Well, the content models of all existing elements (not a few) which reference it via model.pLike would suddenly change for a start! Or are you suggesting making model.abLike a subclass of model.pLike?

martindholmes commented 2 years ago

Not sure what the most effective way to achieve the nesting would be, but we'd want to minimize disruption of course.

ebeshero commented 2 years ago

If we can do this properly, we won’t see fallout over backwards compatibility, because non-nested <ab> elements will continue to be valid. But how would we accomplish it? What if we made model.pLike a subclass of a new model.abLike (rather than the other way around)? Would it make sense for the more permissive model to be the main class, and the more constrained one the subclass?

ebeshero commented 2 years ago

Just reminded myself how we constrain<p> and <ab> (exactly the same way). It might be handy to look at our Schematron rule for abstract model violations:

 <constraintSpec ident="abstractModel-structure-p-in-ab-or-p" scheme="schematron">
    <constraint>
      <sch:report test="    (ancestor::tei:ab or ancestor::tei:p) 
                        and not( ancestor::tei:floatingText
                                |parent::tei:exemplum
                                |parent::tei:item
                                |parent::tei:note
                                |parent::tei:q
                                |parent::tei:quote
                                |parent::tei:remarks
                                |parent::tei:said
                                |parent::tei:sp
                                |parent::tei:stage
                                |parent::tei:cell
                                |parent::tei:figure
                               )">
        Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements.
      </sch:report>
    </constraint>
  </constraintSpec>
ebeshero commented 2 years ago

For <ab> it's the same, and we're saying it can't go inside either a <p> or an <ab>. From our conversation here, I imagine we had better consider whether it's okay to nest <ab> inside <p> as well as <ab>. Is this heretical or just in keeping with the arbitrary anonymity of the "anonymous block"? Because I'm just imagining simply pulling the Schematron rule out of <ab>...and seeing what mayhem ensues!

ebeshero commented 2 years ago

Do we want a world in which:

Further heresy: What if <p> were permitted to nest inside <ab>, but never inside <p>?

ebeshero commented 2 years ago

Finally, it's important we link this with Council's recommendation for https://github.com/TEIC/TEI/issues/1929 (scroll to end). I think we may have found a way to clearly delineate the difference between <p> and <ab> while jettisoning "semantic baggage". I'll assign myself to this ticket with @martinascholger and @JanelleJenstad since I'm supposed to implement the other one. We'd better work on these together.

martindholmes commented 2 years ago

I'm inclined to say that since nobody has asked for <ab> to be able to nest inside <p>, we should probably not worry about that, and just handle the current request for nestable <ab>s. If this ticket turns into a philosophical debate over the nature of blocks and what-a-paragraph-is, it'll probably roll on for a couple of years and lead to nothing helpful. :-)

ebeshero commented 2 years ago

@martindholmes It's just that the easiest course is to remove the Schematron constraint on <ab> entirely...so that's why it's worth asking. The current ticket results in keeping the Schematron there, and just saying <ab> can't be inside <p>. The solution Council proposed for that "What is a paragraph , really" ticket is pretty simple. We're supposed to be moving <ab> out of Linking and into Core. So, while we're at it, we might as well tackle the Schematron constraint, too.

ebeshero commented 2 years ago

What's striking to me is the long list of exceptions we've had to be adding to that Schematron constraint on <ab>. (I've had to add to them to resolve a ticket or two in the past.) If the constraint looks mostly the same for <p> and for <ab>, and we're only allowing <ab> to nest inside itself, then it really does behave mostly like <p> (can't nest within a <p>, unless there's a <floatingText> or <exemplum> or <item> or <stage> or <note> or <q> or <quote> etc. etc. in between). And if we're sticking with that complexityβ€”not allowing it inside <p> unless it's in the long list of exceptions, that makes me think <ab> belongs in model.pLike still, with just a tiny special exception in the old Schematron.

martindholmes commented 2 years ago

@ebeshero I think your argument amounts to saying that in the past we've treated it like a p so we should continue to, but that's what I disagree with. It's not a p.

ebeshero commented 2 years ago

@martindholmes But you yourself just wrote today that if no one asked for <ab> to nest inside <p> we shouldn't worry about it. I'm not really arguing one way or the other. I'm just observing how <ab> is currently constrained from nesting and wondering how to proceed with changing it. It seems to me that we can either do:

  1. a simple change (exactly and only what this ticket asks for): Allow <ab> to nest inside <ab> and don't let it nest inside <p>. From what I can see, that's a small change in the current Schematron constraint.

  2. a larger change (that would mean <ab> doesn't, indeed, behave like <p> at all): This is also potentially quite simple, but permits it to nest in more places (indeed, within <p>): What if we remove the Schematron constraint entirely?

martindholmes commented 2 years ago

I would vote for #1. If anyone later has examples which support allowing ab inside p, then that would be a different FR.

(In that case opponents might might argue that p has semantics which preclude other blocks appearing inside it,although that wouldn't actually be very convincing given that lists are allowed in p.)

hcayless commented 2 years ago

It's not a simple Schematron change though. That rule is meant to prevent, e.g. p inside something else (with a bunch of exceptions) inside p. Fwiw, I never thought that rule was a great idea. It was meant to calm fears that app would allow people to do crazy stuff.

Anyway, the content model of ab would need to be changed and the Schematron rule would probably need to be removed or substantially rewritten for ab's new status.

ebeshero commented 2 years ago

Well, Council’s recommendation to move <ab> to Core (https://github.com/TEIC/TEI/issues/1929#issuecomment-943621793 ) is a good opportunity to rewrite its content model. Since that kludgy Schematron serves to prevent deep nesting, perhaps just removing it from <ab> is an improvement? I suppose we also add <ab> to its content model, too. But these changes do not seem so enormous as we might expect, unless I’m missing something?

ebeshero commented 2 years ago

And of course we are talking about self-nesting here so of course we need to change the content model. And do something about the kludgy Schematron: either tweak it or eliminate it. I would like to just get rid of it on <ab>.