TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
271 stars 88 forks source link

att.spanning value targeting an external document #2246

Open emamimohsen opened 2 years ago

emamimohsen commented 2 years ago

The current att.spanning (@spanTo) constraint looks for a matching element in the current document. Sometimes the ending element is located in a different document. So it seems reasonable to check for a matching element only if @spanTo value starts with a '#' (internal reference); otherwise, it should try to open the referenced document and check inside it for the xml:id. I'm not sure if and how it is possible in Schematron, since I'm new to it.

Thanks to @hcayless.

ebeshero commented 2 years ago

Hi @emamimohsen . Yes, you can check for the values in an external document using Schematron. Somewhere (best I think in the teiHeader of your document) you should specify the relative filepath to the other document, or an absolute filepath. In Schematron you can use the XPath doc() function to point to that document and delineate the XPath to the @xml:ids you need to check.

hcayless commented 2 years ago

@spanTo currently has this overly restrictive Schematron rule. See https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.spanning.html. @emamimohsen would like it either 1) relaxed, so that it applies only to targets beginning with '#' or 2) made able to check across documents. I think doing 1 is probably way easier. Right now, it's causing problems because their project has multiple, sequential TEI documents per work, and so a span might begin in one document and end in another.

sydb commented 2 years ago

This constraint was added in 2012. I have a vague recollection that the feeling at the time was that no, it does not really make sense to have an addition, deletion, et al. span across documents. At the same time we (or at least, I) acknowledged that it is possible that such an item would span across files in a filesystem because a document was being stored in multiple files. Some argued, IIRC, that in such a case very likely the individual files would be combined into one driver document (using XInclude or whatever), and that validation would apply to that driver document. What, @emamimohsen, is your use case for spanning to a different file?

hcayless commented 2 years ago

You say that as though there's a clean distinction between files and documents :-). Nonetheless, I think it's probably bad policy to have constraints that rule out legitimate usages, such as splitting a large work into separate TEI files.

ebeshero commented 2 years ago

Well, I’m surprised we have such a strange rule. It seems counterintuitive to me to disallow spanning to another file in a collection so long as you can express a relationship among files. (Sorry for misunderstanding. I initially thought the question was just about how to validate a @spanTo value in another document.)

sydb commented 2 years ago

While I lean towards agreeing with your conclusion, @hcayless (that constraints should not rule out legitimate, even if rare, usages), there is a very clean distinction between my usage of “file” and of “document”. The former is a feature of your operating system; the latter is, roughly speaking, a <tei:text>. A document typically resides in 1 file, but may be spread over many. A single file cannot contain multiple documents, unless they are grouped into a <tei:group> or part of a corpus. (Because of XML’s arguably pointless XML “single outermost element” constraint.)

All that said, I would still like to know what this particular use case is.

hcayless commented 2 years ago

@emamimohsen's original question went to the Markup list https://lsv.uky.edu/scripts/wa.exe?A2=MARKUP;f94acbba.2203&S= It sounds like the work in question is broken up into chapters, one TEI document per chapter, but added or deleted sections can span chapters.

@sydb, I understand the distinction you're making, but it's nearly the opposite of the one I'd make. All documents are files, some files are documents.

emamimohsen commented 2 years ago

What, @emamimohsen, is your use case for spanning to a different file?

In my case, every chapter of a manuscript is stored in a standalone XML file. There are cases where a spanning section starts in one chapter and ends in another one. In these cases, the @spanTo reference point is located in a different file other than the starting point.

Although merging all the files in a hyper file (using e.g. XInclude) is a good way to represent the logical sequence of chapters, I want each separate file to be used and shared with others as a stand-alone file and be validated correctly.

emamimohsen commented 2 years ago

While I lean towards agreeing with your conclusion, @hcayless (that constraints should not rule out legitimate, even if rare, usages), there is a very clean distinction between my usage of “file” and of “document”. The former is a feature of your operating system; the latter is, roughly speaking, a <tei:text>. A document typically resides in 1 file, but may be spread over many. A single file cannot contain multiple documents, unless they are grouped into a <tei:group> or part of a corpus. (Because of XML’s arguably pointless XML “single outermost element” constraint.)

I'm not much familiar with the TEI terminology. Thank you for clarifying the terms. I'll try to keep it tidy.

sydb commented 2 years ago

Talking this ticket over with @bleekere, seems to us that @emamimohsen has a reasonable request for a reasonable (if rare) use case; furthermore of @hcayless’ two proposed solutions (1 — don’t test to see if the element @spanTo points at comes after the current element unless the @spanTo starts with '#') is all but trivial, and (2 — test to see that the file pointed at by the @spanTo comes later) is all but impossible.

Thus we think that @hcayless’ solution (1) should be implemented with due haste. That said, we think this needs discussion to have someone double-check that this is a good idea, and wonder about some other possible solutions:

  1. Leave test as it is, but change it to role="warning".
  2. In addition to (1), give user a warning that this test is not being performed.
  3. Drop the test altogether.
martindholmes commented 1 year ago

The Schematron could be modified thus:

<sch:rule context="tei:*[starts-with(@spanTo, '#')]">
            <sch:assert test="id(substring(@spanTo,2)) and following::*[@xml:id=substring(current()/@spanTo,2)]">
The element indicated by @spanTo (<sch:value-of select="@spanTo"/>) must follow the current element <sch:name/>
            </sch:assert>
          </sch:rule>

per Council discussion 2022-09-12.