Open raffazizzi opened 6 years ago
This is a substantial and high level task that will require more than 1 person and some time. Assigning to @martinascholger in case it needs a workgroup or multiple people,
@martinascholger @jamescummings and I had a meeting about this issue today and are leaning in favor of moving some classes from tei
to another module. These would be classes that have members and references from one module only. Classes with members and references from multiple modules must stay in tei
.
We would like to get comments from council before going ahead with this. Also we need further discussion about datatypes, see below.
att.duration.iso
(Members are in: spoken
) should be part of the spoken
module, just like its parent class att.duration
model.dimLike
(Members are in: msdescription
) (Referenced in: msdescription
) should be part of the msdescription
module<elementRef>
The first two reasons are possibly not really worth the time we'd have to spend to reorganize the classes, but the third reason (<elementRef>
) makes it more urgent.
Let's consider <annotationBlock>
from spoken
and include it in a customization via <elementRef>
.
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>
<elementRef key="annotationBlock" />
<!-- we also need to reference model.divPart.spoken for annotationBlock
to be reachable (not ZAPped) -->
<classRef key="model.divPart.spoken" />
With the following ODD, <annotationBlock>
will not have @dur
because att.duration
is currently part of the spoken
module. This seems reasonable to us (we'll return to this idea later).
On the other hand, let's now include <orth>
from dictionaries
via <elementRef>
:
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>
<elementRef key="orth" />
<orth>
will have @extent
because att.partials
is part of the tei
module. This is inconsistent with the example above.
We propose that the two ODDs above should behave the same for both elements (i.e. more predictably) and that cherry-picking with *Ref
elements should be explicit.
To fully include <annotationBlock>
:
<elementRef key="annotationBlock" />
<classRef key="model.divPart.spoken" />
<classRef key="att.duration" />
<classRef key="att.duration.iso" />
<classRef key="att.duration.w3c" />
<!-- other shared classes are already part of the tei module -->
To fully include <orth>
<elementRef key="orth" />
<classRef key="att.partials" />
<!-- other shared classes are already part of the tei module -->
Or more simply use <moduleRef>
as it includes all classes from the referenced module by default (as it should):
<moduleRef key="spoken" include="annotationBlock" />
<moduleRef key="dictionaries" include="orth" />
We think that it's fair to require users who use *Ref
elements to be aware of the class system and cherry-pick more precisely, but this could be a point of discussion. We could also make the processing more sophisticated and have it figure out the dependencies for consistent behavior.
tei
to elsewhere? We think they would get ZAPped in processing anyway, so there would be no risk, as long as they're definitely only used (via membership or reference) in one module. Are we correct?tei
even though they're only used in one other module. @jamescummings suggested it could be because they seem generic enough that they could apply to other elements in the future. We think it makes more sense to keep classes in their modules and only move them to tei
if they are suddenly needed elsewhere. Do others have an historical perspective? @lb42 @sydb ?Macros are all used by elements in more than one module, so they must stay where they are. Though future macros should follow the same principle as classes.
We didn't have time to discuss datatypes properly. Only a handful of them are used by attributes defined in one module only. @jamescummings thinks datatypes seem like the kind of definition that should be available to all modules (i.e. keep them all in tei
); @raffazizzi prefers consistency with the class system and suggests to move datatypes used by attributes defined in only one module to be moved to the module.
Since this is an odd-breaking move, I'm basically against it for existing classes. I'm fully in favour of creating new classes only in the module they're required in, if they are specific to that volume, but I think the minor benefits of moving existing classes are outweighed by the potentially large fallout in breaking existing ODDs. In particular, the tendency will be to move little-used classes, and those classes are precisely the ones that may have been explicitly deleted in existing odd files, causing potential breakage if they're moved.
@martindholmes can you provide examples of how this would break existing ODDs? Explicitly deleting does not require specifying the module so where the class is should be irrelevant.
I just tested these two assertions and got the same result (att.partials
got deleted with both ODDs).
<classSpec ident="att.partials" module="tei" mode="delete" type="atts"/>
vs.
<!-- wrong now, but would be correct if classes were reorganized -->
<classSpec ident="att.partials" module="dictionaries" mode="delete" type="atts"/>
@raffazizzi I guess I'm considering how the stylesheets should work rather than how they do work. :-) I was assuming that specifying a module for a class which is not the module in which it appears would generate an error -- and in fact I believe it should -- but if it doesn't (even if using @sydb's TEI for TEI Customizations), then I withdraw my objection. We would have to be aware that any change to ODD processing that caused it to pay attention to the @module
attribute would cause breakage for older ODDs, though.
On 12/10/18 19:02, Martin Holmes wrote:
@raffazizzi https://github.com/raffazizzi I guess I'm considering how the stylesheets should work rather than how they do work. :-) I was assuming that specifying a module for a class which is not the module in which it appears would generate an error -- and in fact I believe it should -- but if it doesn't (even if using @sydb https://github.com/sydb's TEI for TEI Customizations), then I withdraw my objection. We would have to be aware that any change to ODD processing that caused it to pay attention to the |@module| attribute would cause breakage for older ODDs, though.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TEIC/TEI/issues/1710#issuecomment-429410502, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoU9Kj8CRs1mbioKHVtqs7PcEE6spCcks5ukNk6gaJpZM4Qi1tr.
I can think of no instance where the @module attribute is of any significance in ODD processing. You might need it if there were ever two objects of the same name in different modules, but we don't allow that to happen any more.
Where this proposal might lead to backwards compatibility problems is
when people have used
In general, ODD processing fails gracefully: which is a nice way of saying that it won't fall over in a heap with fifty lines of incomprehensible messages (yes, I'm looking at you Mrs Java) if you do something stupid. But it won't point out your errors either.
@lb42 AFAIK you can't cherry-pick classes using <moduleRef>
, but I guess one could use <moduleRef key="transcr" except="ALL_THE_TRANSCR_ELEMENTS" />
to get just the classes, including att.global.facs
.
I agree that moving classes from one module to another would possibly break ODDs, but moving classes used (via membership or reference) only by a certain module to that module shouldn't break anything. Right?
@raffazizzi : yes, you always get all of its classes by default when you supply a <moduleRef>
but that wasn't my point. I was just saying that one should be careful about moving classSpecs out of the tei module to another. To answer the query about the history you raised above: it's probably all my fault: I felt that having all the classes declared in one module (tei) was a helpful simplification and always put them there by default. Sebastian disagreed though, and eventually persuaded me that "specialised" classes should belong in "specialised" modules. But of course what may seem specialised today might be considered generically useful tomorrow (@facs is a case in point), so James's caution is well-placed.
I will run the script again and make fixes as needed, then post the list again and next create a PR with the moved classes for review.
Green for @raffazizzi to generate an updated a list of those classes that should be moved (from tei to specific module), and run said list by Council. Depending on how many there are, how they feel, etc., @martinascholger and @raffazizzi will then move the classes manually or write a routine to do so (and check it in to the Utilities/) directory in a separate branch.
Also, Council F2F recommends doing a special release for this when it's ready.
Class | Current module | Target module | Reason |
---|---|---|---|
att.damaged | tei | transcr | all members are in transcr |
att.breaking | tei | core | all members are in core |
att.cReferencing | tei | core | all members are in core |
att.duration.w3c | tei | spoken | all members are in spoken |
att.media | tei | core | all members are in core |
att.interpLike | tei | analysis | all members are in analysis |
att.measurement | tei | core | all members are in core |
att.pointing.group | tei | linking | all members are in linking |
att.scoping | tei | certainty | all members are in certainty |
att.styleDef | tei | header | all members are in header |
att.citing | tei | core | all members are in core |
att.formula | tei | header | all members are in header |
att.partials | tei | dictionaries | all members are in dictionaries |
att.duration.iso | tei | spoken | all members are in spoken |
Class | Current module | Target module | Reason |
---|---|---|---|
model.hiLike | tei | core | all references are in core |
model.dateLike | tei | core | all references are in core |
model.dimLike | tei | msdescription | all references are in msdescription |
model.egLike | tei | tagdocs | all references are in tagdocs |
model.offsetLike | tei | namesdates | all references are in namesdates |
model.pPart.msdesc | tei | msdescription | all references are in msdescription |
model.ptrLike | tei | core | all references are in core |
model.lPart | tei | verse | all references are in verse |
model.gLike | tei | gaiji | all references are in gaiji |
model.oddDecl | tei | tagdocs | all references are in tagdocs |
model.oddRef | tei | tagdocs | all references are in tagdocs |
model.phrase.xml | tei | tagdocs | all references are in tagdocs |
model.specDescLike | tei | tagdocs | all references are in tagdocs |
model.headLike | tei | core | all references are in core |
model.labelLike | tei | core | all references are in core |
model.noteLike | tei | core | all references are in core |
model.lLike | tei | core | all references are in core |
model.featureVal.complex | tei | iso-fs | all references are in iso-fs |
model.featureVal.single | tei | iso-fs | all references are in iso-fs |
model.entryPart | tei | dictionaries | all references are in dictionaries |
model.eventLike | tei | namesdates | all references are in namesdates |
model.persStateLike | tei | namesdates | all references are in namesdates |
model.personLike | tei | namesdates | all references are in namesdates |
model.placeNamePart | tei | namesdates | all references are in namesdates |
model.availabilityPart | tei | header | all references are in header |
model.certLike | tei | certainty | all references are in certainty |
model.descLike | tei | core | all references are in core |
model.quoteLike | tei | core | all references are in core |
model.frontPart.drama | tei | drama | all references are in drama |
model.divBottomPart | tei | textstructure | all references are in textstructure |
model.catDescPart | tei | corpus | all references are in corpus |
model.textDescPart | tei | corpus | all references are in corpus |
model.castItemPart | tei | drama | all references are in drama |
model.divLike | tei | textstructure | all references are in textstructure |
model.divGenLike | tei | core | all references are in core |
model.div1Like | tei | textstructure | all references are in textstructure |
model.div2Like | tei | textstructure | all references are in textstructure |
model.div3Like | tei | textstructure | all references are in textstructure |
model.div4Like | tei | textstructure | all references are in textstructure |
model.div5Like | tei | textstructure | all references are in textstructure |
model.div6Like | tei | textstructure | all references are in textstructure |
model.div7Like | tei | textstructure | all references are in textstructure |
model.applicationLike | tei | header | all references are in header |
model.teiHeaderPart | tei | header | all references are in header |
model.sourceDescPart | tei | spoken | all references are in spoken |
model.editorialDeclPart | tei | header | all references are in header |
model.objectLike | tei | namesdates | all references are in namesdates |
model.orgStateLike | tei | namesdates | all references are in namesdates |
model.placeLike | tei | namesdates | all references are in namesdates |
Class | Current module | Target module | Reason |
---|---|---|---|
teidata.interval | tei | linking | all references are in linking |
teidata.namespaceOrName | tei | tagdocs | all references are in tagdocs |
teidata.point | tei | transcr | all references are in transcr |
teidata.replacement | tei | header | all references are in header |
teidata.sex | tei | namesdates | all references are in namesdates |
teidata.prefix | tei | header | all references are in header |
teidata.unboundedInt | tei | tagdocs | all references are in tagdocs |
teidata.point | tei | transcr | all references are in transcr |
teidata.temporal.iso | tei | namesdates | all references are in namesdates |
As asked by the subgroup, I have created an update list and summarized the proposed changes (previous comment)
@martinascholger since you're also assigned perhaps you can do a quick review? If it looks ok I can make this into a Pull Request
@raffazizzi looks good. I wonder if it might be a problem to move att.duration.w3c
, att.duration.iso
, and teidata.temporal.iso
because of their members'/references' membership.
Thanks @martinascholger !
att.duration
is already in spoken
even though it has members in core
. So att.duration.w3c
, att.duration.iso
are already only usable when spoken
is included, but they are defined on tei
.
teidata.temporal.iso
seems to only be used by att.datable.iso
, which is defined in namesdates
, so I think it makes sense to move teidata.temporal.iso
from tei
to namesdates
.
Let me know if I missed something.
Makes sense. Let me know if I can help with the PR.
Re-creation of @raffazizzi’s table based on v. 4.7.0a, revision b432af815.
move class | into module | change prose** |
---|---|---|
att.anchoring | core | no |
att.damaged | transcr | no |
att.breaking | core | no |
att.cReferencing | core | no |
att.duration.w3c | spoken | yes, remark in att.duration |
att.media | core | no |
att.interpLike | analysis | no |
att.measurement | core | no |
att.pointing.group | linking | no |
att.scoping | certainty | no |
att.styleDef | header | no |
att.citing | core | no |
att.formula | header | no |
att.partials | dictionaries | no |
att.duration.iso | spoken | no |
att.enjamb | core | this should remain in the 'verse' module |
model.hiLike | core | no |
model.dateLike | core | no |
model.dimLike | msdescription | no |
model.egLike | tagdocs | no |
model.offsetLike | namesdates | no |
model.pPart.msdesc | msdescription | no |
model.lPart | verse | no |
model.gLike | gaiji | no |
model.oddDecl | tagdocs | no |
model.oddRef | tagdocs | no |
model.phrase.xml | tagdocs | no |
model.specDescLike | tagdocs | no |
model.headLike | core | no |
model.labelLike | core | no |
model.noteLike | core | no |
model.lLike | core | no |
model.featureVal.complex | iso-fs | no |
model.featureVal.single | iso-fs | no |
model.entryPart | dictionaries | no |
model.eventLike | namesdates | no |
model.persStateLike | namesdates | |
model.personLike | namesdates | |
model.placeNamePart | namesdates | |
model.availabilityPart | header | |
model.certLike | certainty | |
model.descLike | core | |
model.quoteLike | core | |
model.frontPart.drama | drama | |
model.divBottomPart | textstructure | |
model.catDescPart | corpus | |
model.textDescPart | corpus | |
model.castItemPart | drama | |
model.divLike | textstructure | |
model.divGenLike | core | |
model.div1Like | textstructure | |
model.div2Like | textstructure | |
model.div3Like | textstructure | |
model.div4Like | textstructure | |
model.div5Like | textstructure | |
model.div6Like | textstructure | |
model.div7Like | textstructure | |
model.annotationPart.body | core | |
model.applicationLike | header | |
model.teiHeaderPart | header | |
model.sourceDescPart | spoken | |
model.editorialDeclPart | header | |
model.objectLike | namesdates | |
model.orgStateLike | namesdates | |
model.placeLike | namesdates | |
teidata.interval | linking | |
teidata.temporal.working | tagdocs | |
teidata.namespaceOrName | tagdocs | |
teidata.point | transcr | |
teidata.authority | drama | |
teidata.replacement | header | |
teidata.prefix | header | |
teidata.unboundedCount | tagdocs | |
teidata.point | transcr | |
teidata.point | transcr | |
teidata.temporal.iso | namesdates | |
macro.abContent | linking |
This section needs an update: https://tei-c.org/release/doc/tei-p5-doc/en/html/ST.html#STECAT, especially "Some attribute classes are defined within the tei infrastructural module and are thus globally available. Other attribute classes are specific to particular modules and thus defined in other chapters. Attributes defined by such classes will not be available unless the module concerned is included in a schema."
F2F@Guelph: @sydb had a solid strategy to make sure this can be implemented without breaking DTDs. However, the group discussed it again and is struggling to see the benefits given the amount of work needed. Interestingly, making this change would make selection via <elementRef>
miss even more classes if they get moved away from the tei
module.
We also think that with #1744 there will be an opportunity to revisit the class system and its relationship to modules. So we are blocking this issue until we resolve #1744 and the futures issues emerging from it (as listed on that ticket).
Many classes, macros, and datatypes are in the
tei
module, but they may be placed in more appropriate modules. Below is a list classes, macros, and datatypes grouped by modules and paired with modules they )are referenced in. (The list is generated with this badly written XSLT on p5subset.xml)We probably should keep in
tei
all those classes that are shared between multiple modules. Also note some datatypes and macros that appear to not be used anywhere (not a bad thing per se)ATTRIBUTE CLASSES
tei
namesdates
spoken
core
header
verse
dictionaries
msdescription
transcr
textcrit
linking
analysis
tagdocs
MODEL CLASSES
tei
spoken
textcrit
namesdates
dictionaries
tagdocs
DATATYPES
tei
MACROS
tei
tagdocs