TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

Review placement of classes, macros, and datatypes in modules #1710

Open raffazizzi opened 6 years ago

raffazizzi commented 6 years ago

Many classes, macros, and datatypes are in the tei module, but they may be placed in more appropriate modules. Below is a list classes, macros, and datatypes grouped by modules and paired with modules they )are referenced in. (The list is generated with this badly written XSLT on p5subset.xml)

We probably should keep in tei all those classes that are shared between multiple modules. Also note some datatypes and macros that appear to not be used anywhere (not a bad thing per se)

ATTRIBUTE CLASSES

tei

namesdates

spoken

core

header

verse

dictionaries

msdescription

transcr

textcrit

linking

analysis

tagdocs

MODEL CLASSES

tei

spoken

textcrit

namesdates

dictionaries

tagdocs

DATATYPES

tei

MACROS

tei

tagdocs

emylonas commented 6 years ago

This is a substantial and high level task that will require more than 1 person and some time. Assigning to @martinascholger in case it needs a workgroup or multiple people,

raffazizzi commented 6 years ago

@martinascholger @jamescummings and I had a meeting about this issue today and are leaning in favor of moving some classes from tei to another module. These would be classes that have members and references from one module only. Classes with members and references from multiple modules must stay in tei.

We would like to get comments from council before going ahead with this. Also we need further discussion about datatypes, see below.

Examples:

Pros (why should we bother?)

The first two reasons are possibly not really worth the time we'd have to spend to reorganize the classes, but the third reason (<elementRef>) makes it more urgent.

Let's consider <annotationBlock> from spoken and include it in a customization via <elementRef>.

<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>

<elementRef key="annotationBlock" />
<!-- we also need to reference model.divPart.spoken for annotationBlock 
       to be reachable (not ZAPped) -->
<classRef key="model.divPart.spoken" />

With the following ODD, <annotationBlock> will not have @dur because att.duration is currently part of the spoken module. This seems reasonable to us (we'll return to this idea later).

On the other hand, let's now include <orth> from dictionaries via <elementRef>:

<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>

<elementRef key="orth" />

<orth> will have @extent because att.partials is part of the tei module. This is inconsistent with the example above.

We propose that the two ODDs above should behave the same for both elements (i.e. more predictably) and that cherry-picking with *Ref elements should be explicit.

To fully include <annotationBlock>:

<elementRef key="annotationBlock" />
<classRef key="model.divPart.spoken" />
<classRef key="att.duration" />
<classRef key="att.duration.iso" />
<classRef key="att.duration.w3c" />
<!-- other shared classes are already part of the tei module -->

To fully include <orth>

<elementRef key="orth" />
<classRef key="att.partials" />
<!-- other shared classes are already part of the tei module -->

Or more simply use <moduleRef> as it includes all classes from the referenced module by default (as it should):

<moduleRef key="spoken" include="annotationBlock" />
<moduleRef key="dictionaries" include="orth" />

We think that it's fair to require users who use *Ref elements to be aware of the class system and cherry-pick more precisely, but this could be a point of discussion. We could also make the processing more sophisticated and have it figure out the dependencies for consistent behavior.

Cons

Macros and Datatypes

Macros are all used by elements in more than one module, so they must stay where they are. Though future macros should follow the same principle as classes.

We didn't have time to discuss datatypes properly. Only a handful of them are used by attributes defined in one module only. @jamescummings thinks datatypes seem like the kind of definition that should be available to all modules (i.e. keep them all in tei); @raffazizzi prefers consistency with the class system and suggests to move datatypes used by attributes defined in only one module to be moved to the module.

martindholmes commented 6 years ago

Since this is an odd-breaking move, I'm basically against it for existing classes. I'm fully in favour of creating new classes only in the module they're required in, if they are specific to that volume, but I think the minor benefits of moving existing classes are outweighed by the potentially large fallout in breaking existing ODDs. In particular, the tendency will be to move little-used classes, and those classes are precisely the ones that may have been explicitly deleted in existing odd files, causing potential breakage if they're moved.

raffazizzi commented 6 years ago

@martindholmes can you provide examples of how this would break existing ODDs? Explicitly deleting does not require specifying the module so where the class is should be irrelevant.

I just tested these two assertions and got the same result (att.partials got deleted with both ODDs).

<classSpec ident="att.partials" module="tei" mode="delete" type="atts"/>

vs.

<!-- wrong now, but would be correct if classes were reorganized -->
<classSpec ident="att.partials" module="dictionaries" mode="delete" type="atts"/>
martindholmes commented 6 years ago

@raffazizzi I guess I'm considering how the stylesheets should work rather than how they do work. :-) I was assuming that specifying a module for a class which is not the module in which it appears would generate an error -- and in fact I believe it should -- but if it doesn't (even if using @sydb's TEI for TEI Customizations), then I withdraw my objection. We would have to be aware that any change to ODD processing that caused it to pay attention to the @module attribute would cause breakage for older ODDs, though.

lb42 commented 6 years ago

On 12/10/18 19:02, Martin Holmes wrote:

@raffazizzi https://github.com/raffazizzi I guess I'm considering how the stylesheets should work rather than how they do work. :-) I was assuming that specifying a module for a class which is not the module in which it appears would generate an error -- and in fact I believe it should -- but if it doesn't (even if using @sydb https://github.com/sydb's TEI for TEI Customizations), then I withdraw my objection. We would have to be aware that any change to ODD processing that caused it to pay attention to the |@module| attribute would cause breakage for older ODDs, though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1710#issuecomment-429410502, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoU9Kj8CRs1mbioKHVtqs7PcEE6spCcks5ukNk6gaJpZM4Qi1tr.

I can think of no instance where the @module attribute is of any significance in ODD processing. You might need it if there were ever two objects of the same name in different modules, but  we don't allow that to happen any more.

Where this proposal might lead to backwards compatibility problems is when people have used (correctly) to pull in an obscure class they wouldn't otherwise have available because that module isn't otherwise needed. For example, you might want @facs but nothing else from the transcription module. If for some bizarre reason a future reorganization decided to move the class att.global.facs away from that module to some other one, your ODD might well cease to behave as you expected.

In general, ODD processing fails gracefully: which is a nice way of saying that it won't fall over in a heap with fifty lines of incomprehensible messages (yes, I'm looking at you Mrs Java) if you do something stupid. But it won't point out your errors either.

raffazizzi commented 6 years ago

@lb42 AFAIK you can't cherry-pick classes using <moduleRef>, but I guess one could use <moduleRef key="transcr" except="ALL_THE_TRANSCR_ELEMENTS" /> to get just the classes, including att.global.facs.

I agree that moving classes from one module to another would possibly break ODDs, but moving classes used (via membership or reference) only by a certain module to that module shouldn't break anything. Right?

lb42 commented 6 years ago

@raffazizzi : yes, you always get all of its classes by default when you supply a <moduleRef> but that wasn't my point. I was just saying that one should be careful about moving classSpecs out of the tei module to another. To answer the query about the history you raised above: it's probably all my fault: I felt that having all the classes declared in one module (tei) was a helpful simplification and always put them there by default. Sebastian disagreed though, and eventually persuaded me that "specialised" classes should belong in "specialised" modules. But of course what may seem specialised today might be considered generically useful tomorrow (@facs is a case in point), so James's caution is well-placed.

raffazizzi commented 4 years ago

I will run the script again and make fixes as needed, then post the list again and next create a PR with the moved classes for review.

sydb commented 4 years ago

Green for @raffazizzi to generate an updated a list of those classes that should be moved (from tei to specific module), and run said list by Council. Depending on how many there are, how they feel, etc., @martinascholger and @raffazizzi will then move the classes manually or write a routine to do so (and check it in to the Utilities/) directory in a separate branch.

ebeshero commented 4 years ago

Also, Council F2F recommends doing a special release for this when it's ready.

raffazizzi commented 4 years ago

ATTRIBUTE CLASSES

tei

namesdates

spoken

core

figures

header

gaiji

verse

dictionaries

msdescription

transcr

textcrit

linking

analysis

tagdocs

MODEL CLASSES

tei

spoken

textcrit

msdescription

namesdates

dictionaries

iso-fs

tagdocs

DATATYPES

tei

MACROS

tei

raffazizzi commented 4 years ago

Proposed changes

Attribute classes

Class Current module Target module Reason
att.damaged tei transcr all members are in transcr
att.breaking tei core all members are in core
att.cReferencing tei core all members are in core
att.duration.w3c tei spoken all members are in spoken
att.media tei core all members are in core
att.interpLike tei analysis all members are in analysis
att.measurement tei core all members are in core
att.pointing.group tei linking all members are in linking
att.scoping tei certainty all members are in certainty
att.styleDef tei header all members are in header
att.citing tei core all members are in core
att.formula tei header all members are in header
att.partials tei dictionaries all members are in dictionaries
att.duration.iso tei spoken all members are in spoken

Model Classes

Class Current module Target module Reason
model.hiLike tei core all references are in core
model.dateLike tei core all references are in core
model.dimLike tei msdescription all references are in msdescription
model.egLike tei tagdocs all references are in tagdocs
model.offsetLike tei namesdates all references are in namesdates
model.pPart.msdesc tei msdescription all references are in msdescription
model.ptrLike tei core all references are in core
model.lPart tei verse all references are in verse
model.gLike tei gaiji all references are in gaiji
model.oddDecl tei tagdocs all references are in tagdocs
model.oddRef tei tagdocs all references are in tagdocs
model.phrase.xml tei tagdocs all references are in tagdocs
model.specDescLike tei tagdocs all references are in tagdocs
model.headLike tei core all references are in core
model.labelLike tei core all references are in core
model.noteLike tei core all references are in core
model.lLike tei core all references are in core
model.featureVal.complex tei iso-fs all references are in iso-fs
model.featureVal.single tei iso-fs all references are in iso-fs
model.entryPart tei dictionaries all references are in dictionaries
model.eventLike tei namesdates all references are in namesdates
model.persStateLike tei namesdates all references are in namesdates
model.personLike tei namesdates all references are in namesdates
model.placeNamePart tei namesdates all references are in namesdates
model.availabilityPart tei header all references are in header
model.certLike tei certainty all references are in certainty
model.descLike tei core all references are in core
model.quoteLike tei core all references are in core
model.frontPart.drama tei drama all references are in drama
model.divBottomPart tei textstructure all references are in textstructure
model.catDescPart tei corpus all references are in corpus
model.textDescPart tei corpus all references are in corpus
model.castItemPart tei drama all references are in drama
model.divLike tei textstructure all references are in textstructure
model.divGenLike tei core all references are in core
model.div1Like tei textstructure all references are in textstructure
model.div2Like tei textstructure all references are in textstructure
model.div3Like tei textstructure all references are in textstructure
model.div4Like tei textstructure all references are in textstructure
model.div5Like tei textstructure all references are in textstructure
model.div6Like tei textstructure all references are in textstructure
model.div7Like tei textstructure all references are in textstructure
model.applicationLike tei header all references are in header
model.teiHeaderPart tei header all references are in header
model.sourceDescPart tei spoken all references are in spoken
model.editorialDeclPart tei header all references are in header
model.objectLike tei namesdates all references are in namesdates
model.orgStateLike tei namesdates all references are in namesdates
model.placeLike tei namesdates all references are in namesdates

Datatypes

Class Current module Target module Reason
teidata.interval tei linking all references are in linking
teidata.namespaceOrName tei tagdocs all references are in tagdocs
teidata.point tei transcr all references are in transcr
teidata.replacement tei header all references are in header
teidata.sex tei namesdates all references are in namesdates
teidata.prefix tei header all references are in header
teidata.unboundedInt tei tagdocs all references are in tagdocs
teidata.point tei transcr all references are in transcr
teidata.temporal.iso tei namesdates all references are in namesdates
raffazizzi commented 4 years ago

As asked by the subgroup, I have created an update list and summarized the proposed changes (previous comment)

@martinascholger since you're also assigned perhaps you can do a quick review? If it looks ok I can make this into a Pull Request

martinascholger commented 4 years ago

@raffazizzi looks good. I wonder if it might be a problem to move att.duration.w3c, att.duration.iso, and teidata.temporal.iso because of their members'/references' membership.

raffazizzi commented 4 years ago

Thanks @martinascholger !

att.duration is already in spoken even though it has members in core. So att.duration.w3c, att.duration.iso are already only usable when spoken is included, but they are defined on tei.

teidata.temporal.iso seems to only be used by att.datable.iso, which is defined in namesdates, so I think it makes sense to move teidata.temporal.iso from tei to namesdates.

Let me know if I missed something.

martinascholger commented 4 years ago

Makes sense. Let me know if I can help with the PR.

sydb commented 1 year ago

Re-creation of @raffazizzi’s table based on v. 4.7.0a, revision b432af815.

move class into module change prose**
att.anchoring core no
att.damaged transcr no
att.breaking core no
att.cReferencing core no
att.duration.w3c spoken yes, remark in att.duration
att.media core no
att.interpLike analysis no
att.measurement core no
att.pointing.group linking no
att.scoping certainty no
att.styleDef header no
att.citing core no
att.formula header no
att.partials dictionaries no
att.duration.iso spoken no
att.enjamb core this should remain in the 'verse' module
model.hiLike core no
model.dateLike core no
model.dimLike msdescription no
model.egLike tagdocs no
model.offsetLike namesdates no
model.pPart.msdesc msdescription no
model.lPart verse no
model.gLike gaiji no
model.oddDecl tagdocs no
model.oddRef tagdocs no
model.phrase.xml tagdocs no
model.specDescLike tagdocs no
model.headLike core no
model.labelLike core no
model.noteLike core no
model.lLike core no
model.featureVal.complex iso-fs no
model.featureVal.single iso-fs no
model.entryPart dictionaries no
model.eventLike namesdates no
model.persStateLike namesdates
model.personLike namesdates
model.placeNamePart namesdates
model.availabilityPart header
model.certLike certainty
model.descLike core
model.quoteLike core
model.frontPart.drama drama
model.divBottomPart textstructure
model.catDescPart corpus
model.textDescPart corpus
model.castItemPart drama
model.divLike textstructure
model.divGenLike core
model.div1Like textstructure
model.div2Like textstructure
model.div3Like textstructure
model.div4Like textstructure
model.div5Like textstructure
model.div6Like textstructure
model.div7Like textstructure
model.annotationPart.body core
model.applicationLike header
model.teiHeaderPart header
model.sourceDescPart spoken
model.editorialDeclPart header
model.objectLike namesdates
model.orgStateLike namesdates
model.placeLike namesdates
teidata.interval linking
teidata.temporal.working tagdocs
teidata.namespaceOrName tagdocs
teidata.point transcr
teidata.authority drama
teidata.replacement header
teidata.prefix header
teidata.unboundedCount tagdocs
teidata.point transcr
teidata.point transcr
teidata.temporal.iso namesdates
macro.abContent linking
martinascholger commented 1 year ago

This section needs an update: https://tei-c.org/release/doc/tei-p5-doc/en/html/ST.html#STECAT, especially "Some attribute classes are defined within the tei infrastructural module and are thus globally available. Other attribute classes are specific to particular modules and thus defined in other chapters. Attributes defined by such classes will not be available unless the module concerned is included in a schema."

raffazizzi commented 1 year ago

F2F@Guelph: @sydb had a solid strategy to make sure this can be implemented without breaking DTDs. However, the group discussed it again and is struggling to see the benefits given the amount of work needed. Interestingly, making this change would make selection via <elementRef> miss even more classes if they get moved away from the tei module. We also think that with #1744 there will be an opportunity to revisit the class system and its relationship to modules. So we are blocking this issue until we resolve #1744 and the futures issues emerging from it (as listed on that ticket).