SAA-SDT / eac-cpf-schema

https://eac.staatsbibliothek-berlin.de/
10 stars 4 forks source link

<maintenanceAgency> #86

Closed SJagodzinski closed 2 years ago

SJagodzinski commented 4 years ago

Maintenance Agency

Creator of issue

  1. Silke Jagodzinski
  2. TS-EAS: EAC-CPF subgroup
  3. silkejagodzinski@gmail.com

Related issues / documents

<conventionDeclaration>: add sub-elements #67

EAD3 Reconciliation

Additional EAD 3 attributes @altrender - Optional @audience- Optional (values limited to: external, internal) @countrycode- Optional @encodinganalog- Optional @lang - Optional @script- Optional

Context

The institution or service responsible for the creation, maintenance, and/or dissemination of the EAC-CPF instance.

May contain: <agencyCode>, <agencyName>, <descriptiveNote>, <otherAgencyCode> May occur within: <control> Attributes: @xml:id - Optional Availability: Mandatory, Non-repeatable

Solution documentation

Rephrasing Summary, Description and Usage and Attribute usage needed?

May contain: <agencyCode>, <agencyName>, <descriptiveNote>, <otherAgencyCode> May occur within: <control> Attributes: @audience - optional (values limited to: external, internal) @countryCode - optional @id - optional @languageOfElement - optional @scriptOfElement - optional @valueURI - optional @vocabularySource - optional @vocabularySourceURI - optional

Availability: Required, not repeatable

Example encoding

<control>
 <recordId>record identifier</recordId>
 <maintenanceStatus value="new"/>
 <publicationStatus value="inprocess"/>
 <maintenanceAgency audience="external" countryCode="DE" id="maintenanceagency1" languageOfElement="eng" scriptOfElement="lat" vocabularySource="ISIL_Agency" vocabularySourceURI="http://ld.zdb-services.de/resource/organisations/" valueURI="http://ld.zdb-services.de/resource/organisations/XYZ">
  <agencyCode>agency code</agencyCode>
  <otherAgencyCode>other agency code</otherAgencyCode>
  <agencyName>agency name</agencyName>
 </maintenanceAgency>
 <maintenanceHistory>
  [...]
 </maintenanceHistory>
</control>
ailie-s commented 4 years ago

Tag Library Text:

Summary: A required child element of <control> that identifies the institution or service responsible for the EAC-CPF instance. May contain: <agencyCode> (0..1), <agencyName> (1..n), <descriptiveNote> (0..1), <otherAgencyCode> (0..n) May occur within: <control> Description and Usage: Information about the institution or service responsible for the creation, maintenance, and/or dissemination of the EAC-CPF instance. <maintenanceAgency> must include a child <agencyName> to provide the name of the institution or service. It is recommended to include the optional <agencyCode> and/or <otheragencyCode> children to unambiguously identify the institution or service. Any general information about the institution in relation to the EAC-CPF instance may be given in <descriptiveNote>. Attributes: @audience - optional (values limited to: external, internal) @countryCode - optional @id - optional @languageOfElement - optional @scriptOfElement - optional Attribute Usage: Use @countryCode to indicate a unique code for the country of the maintenance agency. Availability: Required, not repeatable

SJagodzinski commented 3 years ago

@fordmadox : To avoid an empty mandatory wrapper element in <control>, I suggest to make one of the elements <agencyCode> or <agencyName> mandatory by using xs:choice.

Attention: <agencyCode> must not be repeatable, we use <otherAgencyCode> for that purpose, whereas <agencyName> is repeatable for multilingualism.

kerstarno commented 3 years ago

Isn't <agencyName> mandatory already? Or is the suggested change mainly about having either <agencyName> or <agencyCode> as mandatory sub-element? Just to clarify so that I can keep track accordingly for EAD.

fordmadox commented 3 years ago

<agencyName> is currently required in both EAC and EAD3, so <maintenanceAgency> should never be an empty element right now. If the suggestion is to make it possible to supply an <agencyCode> in place of the agencyName, however, can you add a new issue for that @SJagodzinski ?

kerstarno commented 3 years ago

Schema team, schema tests:

I've tested the draft XSD and RNG schemas for EAC-CPF 2.0 with regard to "maintenanceAgency" and can confirm:

With regard to the sub-elements <maintenanceAgency>, however, the two schema variants are using different definitions at the moment:

The source file for the "control" module (https://github.com/SAA-SDT/eac-cpf-schema/blob/development/source/modules/control.rng) seems to be fine, i.e. following the first definition, so I'm not sure where the bug gets in with regard to the XSD. Maybe something in the build (https://github.com/SAA-SDT/eac-cpf-schema/tree/development/build)? For now, I have created a pull request to fix the XSD in order for it to match the RNG: https://github.com/SAA-SDT/eac-cpf-schema/pull/164

kerstarno commented 3 years ago

Retested and can now confirm that the occurrence of sub-elements in <maintenanceAgency> is now correct in both, XSD and RNG, i.e.

Schema tests have now been successful in all aspects.

Pending question for EAC-CPF team regarding the sequence of sub-elements, which currently prescribes the optional <agencyCode> and <otherAgencyCode> to appear before the required <agencyName>.

SJagodzinski commented 3 years ago

Pending question for EAC-CPF team regarding the sequence of sub-elements, which currently prescribes the optional <agencyCode> and <otherAgencyCode> to appear before the required <agencyName>.

I think we agreed to follow @fordmadox proposal to avoid a given order for child elements in EAS.

kerstarno commented 3 years ago

Cardinality change as a result of the latest change in #88.

SJagodzinski commented 3 years ago

I suggest to provide a choice having either <agencyName> or <agencyCode> available. If only one of the elements is available, it has to have content.

kerstarno commented 3 years ago

@fordmadox - this will need redoing in the schema. Please let me know, when you've had time to implement the change, and I'll re-test.

fordmadox commented 3 years ago

@SJagodzinski @kerstarno : if I understand correctly, the current proposal is:

Is that right?

fordmadox commented 3 years ago

As an aside: is there any need to keep an element like "otherAgencyCode", or couldn't we just allow "agencyCode" to repeat? I feel the same way about the recordId/otherRecordId distinction, but I'm especially interested in this right now since we are modeling a choice between an non-repeatable element and a repeatable element, which I think just makes things confusing for users. Why no otherAgencyName element, for instance? (which would be a horrible addition, so I'm not suggesting such an element :smile:)

kerstarno commented 3 years ago

@SJagodzinski @kerstarno : if I understand correctly, the current proposal is:

  • maintenanceAgency is a required element that cannot repeat.
  • it must contain either an agencyCode element OR an agencyName element.
  • those two elements should always contain some text when they are present.

Is that right?

Yes to the first two aspects, but If I understood @SJagodzinski correctly, the idea would be to only require content if only one of the elements is available, i.e. the following would be valid without requiring content in <agencyCode>:

<maintenanceAgency>
  <agencyCode/>
  <agencyName>Archives of Wonderland</agencyName>
</maintenanceAgency>

while one would need to have content in <agencyCode> if it were used by itself, i.e.

<maintenanceAgency>
  <agencyCode>DE-1958</agencyCode>
</maintenanceAgency>
kerstarno commented 3 years ago

As an aside: is there any need to keep an element like "otherAgencyCode", or couldn't we just allow "agencyCode" to repeat? I feel the same way about the recordId/otherRecordId distinction, but I'm especially interested in this right now since we are modeling a choice between an non-repeatable element and a repeatable element, which I think just makes things confusing for users. Why no otherAgencyName element, for instance? (which would be a horrible addition, so I'm not suggesting such an element 😄)

As for the aside: For <agencyCode>, it is currently recommended that this follows the standard defined in @repositoryEncoding, ideally ISO 15511. This is to ensure, that an EAS instance ideally includes a globally unique identifier of the institution responsible for its creation. <otherAgencyCode> can have any format, same as <agencyName> can have any format, which is why these are repeatable, even if it might only be the minority of cases that actually make use of repeating the <agencyName> or adding an <otherAgencyCode>. Though for the latter, I know of various cases in the context of Archives Portal Europe where it is a welcome option to also provide some national identifiers for the institutions along with the ISO 15511 one.

fordmadox commented 3 years ago

Such a construction can work in RNG, but I don't believe there is any way to do that in XSD. See: https://www.w3.org/TR/xmlschema-1/#cos-element-consistent

fordmadox commented 3 years ago

Also, I'd say that's a good rule in general, despite the flexibility of RNG. In other words, we should not define an element with the same name and give it a different content model (e.g. in one instance it requires text, and in another it does not).

kerstarno commented 3 years ago

Not sure, what you mean: where would we have an element with the same name used with different content models in different contexts?

fordmadox commented 3 years ago

In your example above, if you're saying that "agencyCode" can be empty in the first example, but that it must have a text node in the second example, then I believe that would violate the "Element Declarations Consistent" constraint in XSD.

kerstarno commented 3 years ago

Ah, ok. Now I see.

Well, this was mainly my interpretation of @SJagodzinski's suggestion. Personally, I'd be fine in saying: "whenever you use either <agencyCode> or <agencyName>, they cannot be empty, and you have to at least use one of them."

I.e.

<maintenanceAgency>
  <agencyCode>WL-111</agencyCode>
  <agencyName>Archives of Wonderland</agencyName>
</maintenanceAgency>

or

<maintenanceAgency>
  <agencyCode>WL-111</agencyCode>
</maintenanceAgency>

or

<maintenanceAgency>
  <agencyName>Archives of Wonderland</agencyName>
</maintenanceAgency>

plus, if we would go with the <part> approach for <agencyName>,

<maintenanceAgency>
  <agencyName>-</agencyName>
</maintenanceAgency>
fordmadox commented 3 years ago

I think that's the best we can do with the XSD, if we want to enforce that one is present and non-empty. We could instead have a rule in the Schematron, but it would amount to the same thing, I suspect.

Anyhow, for testing, @SJagodzinski and @kerstarno , I've updated the base schemas so that:

fordmadox commented 3 years ago

If this approach is agreeable, then I can update the conversion process and re-generate the EAC 2 sample files: https://github.com/SAA-SDT/eac1-to-eac2-conversion/tree/main/sample-files/output (these files are all invalid now, but very easy to re-generate them if this approach works)

If this approach is not agreeable, just let me know what we should explore next. But we cannot allow the same element to be empty in some instances but be forced to have a text node in another instance, at least not in the XSD schema, although we could add such a rule to the Schematron if that approach is preferred. That said, I think that such a variable approach would be more confusing than the other options.

kerstarno commented 3 years ago

Re-tested:

The test results above apply to both schemas, RNG and XSD.

(@fordmadox and @SJagodzinski I'll update #87 and #88 accordingly, once the above is confirmed.)

fordmadox commented 3 years ago

Regarding that last point: we can easily enforce the order so that agencyName and otherAgencyCode cannot be mixed. So, just to be clear, that choice is not made for us by the XSD-serialization process. I added it that way since that it is how it was defined in the previous draft branch approach (and since that type of flexibility is still possible in the XSD), but it's just as easy to have a strict order for all 4 possible child elements of maintenanceAgency if that is desired. I'll make a new push now for that option.

Just let me know which of the following two options is preferred, and I will keep that whichever commit in the base directory.

Allow agencyName and otherAgencyCode to be interleaved example: https://github.com/SAA-SDT/eac-cpf-schema/tree/9e8c636632dac7d8016ba784fa4bf3b8e4bf9c5a/xml-schemas/eac-cpf

Require a strict order for all children elements of maintenanceAgency examples: https://github.com/SAA-SDT/eac-cpf-schema/tree/006f02026763bce62fbf1b50f12d273271cf58a3/xml-schemas/eac-cpf

SJagodzinski commented 3 years ago

@fordmadox

[...] I've updated the base schemas so that:

* maintenanceAgency is required again;

Right

* since it is required and not optional, the order has been moved up (and it occurs before maintenanceHistory, just like in the current EAC);

Right

* the maintenanceAgency element _must_ contain at least one agencyName OR one agencyCode element.

Right

* agencyName and agencyCode are now defined as non-empty elements.

Well... I don't think is approach is userfriendly but seeing the xsd limitation, of course I agree. My idea was indeed, to force content if there is only one of the elements. Eg it would be fine to have an agency code but an empty agency name.

fordmadox commented 3 years ago

Here's a third approach to test: https://github.com/SAA-SDT/eac-cpf-schema/tree/006f02026763bce62fbf1b50f12d273271cf58a3/xml-schemas/eac-cpf https://github.com/SAA-SDT/eac-cpf-schema/tree/93f6ed6401c9fa85fa8fed5bb440cec47a224761/xml-schemas/eac-cpf. (update: when I copied the link previously, it hadn't updated in my clipboard, so it was the wrong one!)

Both agencyCode (1..1) and agencyName (1..n) are required, but like any XML element required by default, they can be empty (so, no problems with the migration process). That way, a template will include both elements by default, and users won't have to go through the process of choosing one or the other just to get a valid file (although that process is still required for multipleIdentities vs. cpfDescription). Then, we can add something to the Schematron that will check to make sure that at least one of those elements has some content.

kerstarno commented 3 years ago

Apologies, but I'm setting this back to "Schema" as we currently have three different versions of the schema and need to confirm if they work as intended and which will be the final one:

@fordmadox - I've assumed that option 3 is what's currently in the development branch, as the link posted in the previous comment is the same link as for option 2, plus the last update for development branch seems to match). Also, the above only applies to the RNG schema of option 3. The XSD schema of option 3 is exactly as option 2. (updated following the update in the previous comment)

Not 100% sure, but I'm assuming that option 3 (with the addition of a Schematron rule to check that at least one of <agencyCode> or <agencyName> has content) might closest to what @SJagodzinski would like to see.

Should that eventually be the chosen option, I just would like to point out that - instead of either making a required, but empty element optional or making its requirement clearer by making content mandatory (as we now have in <recordId> and <part>) - we would actually be adding yet another required element that can be left empty.

fordmadox commented 3 years ago

For option 4, how about this:

There are, of course, many more options, but we should settle on one before/by tomorrow so that we can ready everything in time for the Call for Comments.

Option 4 can be tested here: https://github.com/SAA-SDT/eac-cpf-schema/tree/0faf1b2a8e2019e21d0c1d82ba7e3cb5ffdbd3e8/xml-schemas/eac-cpf. (note the commit hash in the URI, but it can also be tested via the standard 'development' branch URL currently). The only thing that cannot be tested here is the Schematron part, but outside of that, I believe this option should align with what @SJagodzinski described previously.

fordmadox commented 3 years ago

All that said, I think that it would be easiest if we continued to say that agencyName was required (as in EAC 1), and allowed it to be empty. We could then add a Schematron rule to ensure that either an agencyName or an agencyCode was available that had text. That way, we could still make agencyName first in the list, since it would always be required (even if empty), and agencyCode would be optional.

kerstarno commented 3 years ago

If only we would have thought of the Schematron option during the last Schema Team meeting when we started this conversation about whether it made sense to require elements which then can be left empty and before posting the Schema Team's suggestion here on GitHub. ;-)

With all these Schematron checks, just to confirm: will it still be possible to use the EAC-CPF 2.0 schema without Schematron?

Also, would we reconsider the decision to require content via the Schema for <recordId> and <part> and doing these via Schematron as well?

karinbredenberg commented 3 years ago

Using the XSD Schema without the Schematron is no problem but all validation handled in the Schematron won't be run.

Moving the check of this to be only in Schematron might then mean it wont be checked if you havent implemented the use of Schematron since with the XSD you need to add that, in the RNG it is enforced in the schema itself.

kerstarno commented 3 years ago

Re-tested option 4 (https://github.com/SAA-SDT/eac-cpf-schema/tree/0faf1b2a8e2019e21d0c1d82ba7e3cb5ffdbd3e8/xml-schemas/eac-cpf) and can confirm:

The above applies to both schemas, RNG and XSD.

kerstarno commented 3 years ago

Using the XSD Schema without the Schematron is no problem but all validation handled in the Schematron won't be run.

Moving the check of this to be only in Schematron might then mean it wont be checked if you havent implemented the use of Schematron since with the XSD you need to add that, in the RNG it is enforced in the schema itself.

Thanks, Karin. I was mainly wondering about the requirements with regard to elements or attributes not being empty. These checks, at least not the ones that we are talking about in these currently remaining issues, are neither defined in the XSD nor in the RNG schema per se. If that's ok, that's ok. Just making sure.

karinbredenberg commented 3 years ago

In both XSD and RNG there are possible to set rules making sure the element when used is not empty. Which also can be used in the Schematron.

kerstarno commented 3 years ago

In both XSD and RNG there are possible to set rules making sure the element when used is not empty. Which also can be used in the Schematron.

I was mainly referring to what is or is not present in the most recent version of the development schemas, not the general possibilities :-)

kerstarno commented 3 years ago

@SJagodzinski we still have four possible solutions in the schema for this one at the moment. Could you please confirm whether option 4 (https://github.com/SAA-SDT/eac-cpf-schema/issues/86#issuecomment-767611723) is the preferred way to go? Thanks very much in advance.

kerstarno commented 3 years ago

Option 4 has been confirmed during the EAC Team meeting on 5 February.

Thereby, this element is ready schema-wise.