SAA-SDT / eac-cpf-schema

https://eac.staatsbibliothek-berlin.de/
10 stars 4 forks source link

<description> #138

Closed SJagodzinski closed 2 years ago

SJagodzinski commented 4 years ago

Description

Creator of issue

  1. Silke Jagodzinski
  2. TS-EAS: EAC-CPF subgroup
  3. silkejagodzinski@gmail.com

Related issues / documents

Role and function of singular/plural elements is confusing #21

EAD3 Reconciliation

EAC-CPF specific element

Context

A wrapper for all of the content elements comprising description of the CPF entity described in the EAC-CPF instance.

May contain: <biogHist>, <existDates>, <function>, <functions>, <generalContext>, <languageUsed>, <languagesUsed>, <legalStatus>, <legalStatuses>, <localDescription>, <localDescriptions>, <mandate>, <mandates>, <occupation>, <occupations>, <place>, <places>, <structureOrGenealogy> May occur within: <cpfDescription> Attributes: @xml:base, @xml:id, @xml:lang Availability: Optional, Non-repeatable

Solution documentation: agreed solution for TL and guidelines

Rephrasing Summary, Description and Usage and Attribute usage needed?

May contain: <biogHist>, <existDates>, <functions>, <generalContext>, <languagesUsed>, <legalStatuses>, <localDescriptions>, <mandates>, <occupations>, <places>, <structureOrGenealogy> May occur within: <cpfDescription> Attributes: @audience - optional (values limited to: external, internal) @base - optional @conventationDeclarationReference - optional @id - optional @languageOfElement - optional @maintenanceEventReference - optional @scriptOfElement - optional @sourceReference - optional

Availability: Optional, Non-repeatable

In EAC-CPF 2010 there are formal and informal descriptive elements within <description>. Formal descriptive elements can be bundled within their plural element and they can also be repeated as singular elements, e.g. <places> and <place>. The informal descriptive elements are <structureOrGenealogy>, <generalContext>, <biogHist>. They are repeatable but can't be bundled in a plural element.

Repeat child elements if you describe the same function in different languages.

Distinction btw formal and informal descriptive elements will be part of documentation but not included in the schema.

Example ecoding

<eac-cpf>
    <control>[...]</control>
    <cpfDescription>
        <identity>[...]</identity>
        <description audience="external" base="baseURL" conventionDeclarationReference="conventiondeclaration1" id="description1" languageOfElement="en" maintenanceEventReference="maintenancevent1" scriptOfElement="lat" sourceReference="source1">
            <biogHist>[...]</biogHist>
            <existDates>[...]</existDates>
            <functions>[...]</functions>
            <generalContext>[...]</generalContext>
            <languagesUsed>[...]
                <languageUsed>
                    <language languageCode="">[...]</language>
                    <writingSystem scriptCode=""></script>
                </languageUsed>
            </languagesUsed>
            <legalStatuses>[...]</legalStatuses>
            <localDescriptions localType="">[...]</localDescriptions>
            <mandates>[...]</mandates>
            <occupations>[...]</occupations>
            <places>[...]</places>
            <structureOrGenealogy>[...]</structureOrGenealogy>
        </description>
        <relations>[...]</relations>
        <alternativeSet>[...]</alternativeSet>
    </cpfDescription>
</eac-cpf>
ailie-s commented 3 years ago

Tag Library Text:

Summary: An optional child element of <cpfDescription>, <description> is a wrapper element for all of the content elements comprising description the CPF entity described in the EAC-CPF instance. May contain: bioHist (0..n), existDates (0..n), functions (0..n), generalContext (0..n), languagesUsed (0..n), legalStatuses (0..n), localDescriptions (0..n), mandates (0..n), occupations (0..n), places (0..n), structureOrGenealogy (0..n) May occur within: cpfDescription Attributes: @audience - optional (values limited to: external, internal) @base - optional @conventationDeclarationReference - optional @id - optional @languageOfElement - optional @maintenanceEventReference - optional @scriptOfElement - optional @sourceReference - optional Description and Usage: The elements that constitute <description> together permit descriptive information to be encoded in either structured or unstructured fashions, or in a combined approach. <description> accommodates the encoding of all the data elements that comprise the Description Area of ISAAR (CPF) including historical, biographical, and genealogical information; legal status and mandates; functions, occupations, and activities, and the dates and places that further constrain those elements. Availability: Optional, not repeatable

karinbredenberg commented 3 years ago

Element description tested: May occur within: cpfDescription (0..1) ok in both schemas bioHist (0..n) ok in both schemas existDates (0..1) In both schemas cardinality is 0..n functions (0..n) ok in both schemas generalContext (0..n) ok in both schemas languagesUsed (0..n) ok in both schemas legalStatuses (0..n) ok in both schemas localDescriptions (0..n) ok in both schemas mandates (0..n) ok in both schemas occupations (0..n) ok in both schemas places (0..n) ok in both schemas structureOrGenealogy (0..n) ok in both schemas @audience - optional (values limited to: external, internal) ok in both schemas @base - optional ok in both schemas @conventationDeclarationReference - optional ok in both schemas @id - optional ok in both schemas @languageOfElement - optional ok in both schemas @maintenanceEventReference - optional ok in both schemas @scriptOfElement - optional ok in both schemas @sourceReference - optional ok in both schemas

Result: cardinality of existDates needs to be changed in both schemas. connected with #144 , cardinality to be solved.

karinbredenberg commented 3 years ago

Retested. Cardinality of exitDates is 0..n in both schemas.

fordmadox commented 3 years ago

Note: the above description for the multiple elements within description have 0..n (e.g. functions (0..n)), but the corresponding issues for those plural elements have an availability of 0..1 listed.

There is a technical issue with trying to mix 0..1 and 0..n elements in the same group when converting to XSD 1.0., if we want those elements to appear in any order. Although there is a workaround to that issue, as far as I can tell, it would not be very maintainable if we continue to define our schemas first in RNG (nor does it scale well depending on the number of elements that need to be mixed), and it certainly violates our principle of simplicity.

Due to that issue, @SJagodzinski, we should either:

The same issue applies for the controlelement, though in that case, I think it makes the most sense to keep things as is, and enforce an order (since I do think it is simpler, for people at least, to always expect the recordId element to appear first, the singular descriptiveNote element to be last when present, etc.). And, heck, the name of the element is "control", which implies that it should be orderly and predicatable :smile:

That said, for both "control" and "description", I'd likely prefer consistency for how those elements are defined, rather than allowing them to be so flexible.

In case it helps, here's how the elements in the control element can be defined in any order (even when mixing different cardinalities) using XSD 1.1: https://github.com/SAA-SDT/eac-cpf-schema/blob/dev-interleave-tests/xml-schemas/eac-cpf/cpf-1.1-example.xsd#L63-L78 (where the comments indicate the cardinality)

I should also add that I'm no longer in favor (as I was previously was) of allowing the elements to always be in any order whatsoever. Although this is easy to define in the RNG schema, I think that doing so ultimately violates our principle of simplicity (especially for anyone who still might be hand-editing XML files, since the software suggestions are a LOT less useful when the elements can appear in any order whatsoever, and when it comes to software implementors, there should be no difficulty in enforcing custom sorting operations as data as added).

I'll think a bit more about different options and a possible direction, but I'd probably lean toward:

Then only thing I'm still slightly concerned with is data loss with the migration from EAC 1 to 2, especially since the former can have multiple pluralized elements in the description section, but I doubt that that exists too much in real-world files. That said, I haven't checked into that yet, but will do so before the EAC team's January meeting.

fordmadox commented 3 years ago

Anyhow, I've updated the RNG and XSD schemas so that the plural elements cannot repeat (whereas they could before in the XSD schema). But, I've also added an order to do that for the time being. Right now, the prescribed order in the current branch of the development schema for the description element is as follows:

  1. existDates (0..N)
  2. pluralized elements (0..1), in any order:
    • functions
    • languagesUsed
    • legalStatuses
    • localDescriptions
    • mandates
    • occupations
    • places
      1. narrative elements (0..N), in any order:
    • biogHist
    • generalContext
    • structureOrGenealogy

That is slightly different from EAC 1, however, where biogHist had to be the last element here, if present.

SJagodzinski commented 3 years ago

@fordmadox : I agree with your solution. Plural element, except <existDates> are not repeatable, as we agreed last year.

Elements order depends on Schema team decision in my opinion. There should be a common approach to all EAS schemas and it must be comprehensible for users. Explain the order in our documentation.

fordmadox commented 3 years ago

Just as an update, we cannot go with any order of the 0..1 elements in XSD 1.0, so scratch that "in any order" after number 2 in my previous comment. If there were only a few it wouldn't matter, but with 7, I think that means we'd be on the hook for having a choice option with 5,040 different possible orders, so that won't work.

For the time being, I've put the non-repeatable values first, in alphabetic order (e.g. functions --> places). I moved "existDates" down to be with the rest of the repeatable options, but there would be no problem in keeping that element first, if desired. But I would prefer not to have it first unless it were required, or a formatting element, like "head".

I'm going to investigate now how many instances of the newly non-repeatable elements (e.g. mandates) there might be that repeat in the EAC 1.0 testbed. Hopefully there aren't any, but if there are, then we'll definitely need to think through the upgrade implications of that.

fordmadox commented 3 years ago

Update: I have zero instances of repeatable plural groups in the real-world EAC set of documents! There are plenty of repeats of elements like "function" (but no repeats of "functions) within the same document, at least.

I'll still need to provide some sort of upgrade path for this, but I think it should be fine to just group them (and any descriptiveNote children) during the automatic upgrade process (and we could add an alert for when that is actually triggered, since if someone really did have, say, two different wrapper "functions" elements in their EAC 1.0 record, they might want to investigate the result of what that looks like / means when the elements are consolidated into a single one).

kerstarno commented 3 years ago

@fordmadox - with the latest status of the conversation around elements' order, I was wondering if this specific issue here could be considered as "done" for the part of the Schema Team?

fordmadox commented 3 years ago

Yes, I think that this issue can be considered done, @kerstarno. Is the current order acceptable, @SJagodzinski, or should it be adjusted before the call for comments?

I do not like the workaround that we need to employ just for XSD 1.0, but I don't think there is any desire to a) make the plural elements repeatable, nor a desire to b) not deliver the XSD variant at all, nor a proposal to c) create a non-repeatable wrapper element that would house existDates, biogHist, generalContext, and structureOrGenealogy. As for that last option, though, I do think it might be useful elsewhere (e.g. https://github.com/SAA-SDT/eac-cpf-schema/commit/eb96be93a1d863aff95ddf016bc39f5767efa2f2, which is a simple example to illustrate that we could still group recordId and otherRecordId despite the new ordering rules)

fordmadox commented 3 years ago

And just to be clear, going with the new rules for ordering elements, here's what that looks like right now in the "description" element:

SJagodzinski commented 3 years ago

Is the current order acceptable, @SJagodzinski, or should it be adjusted before the call for comments?

@fordmadox : yes, the current order is fine and must not be changed.

I do not like the workaround that we need to employ just for XSD 1.0, but I don't think there is any desire to a) make the plural elements repeatable, nor a desire to b) not deliver the XSD variant at all, nor a proposal to c) create a non-repeatable wrapper element that would house existDates, biogHist, generalContext, and structureOrGenealogy.

Here I also agree, I don't see a need for these options.