arkivverket / noark5-tjenestegrensesnitt-standard

6 stars 11 forks source link

KorrespondansepartIntern defines fields not available in Noark 5.5 #282

Closed ivaylomitrev closed 1 year ago

ivaylomitrev commented 1 year ago

       Prosjekt  NOARK 5 Tjenestegresesnitt
       Kategori  Noark 5.5.0 TG versjon 1.0
    Alvorlighet  kommentar / protest
   Meldingstype  utelatt / trenger klargjøring
Brukerreferanse  user@example.com
    Dokumentdel  # Chapter #7 (KorrespondansepartIntern)

Beskrivelse

The KorrespondansepartIntern specialization specifies four fields:

Noark 5.5, however, does not identify referanseAdministrativEnhet or referanseSaksbehandler fields in its metadata catalog which makes these unmappable for vendors with existing solutions. I understand the requirement and it does make sense, but this may lead to issues for integrating parties and should, in my opinion, be reconsidered.

Ønsket endring

N/A; needs to be discussed

petterreinholdtsen commented 1 year ago

[ivaylomitrev]

The KorrespondansepartIntern specialization specifies four fields:

  • administrativEnhet
  • referanseAdministrativEnhet
  • saksbehandler
  • referanseSaksbehandler

Noark 5.5, however, does not identify referanseAdministrativEnhet or referanseSaksbehandler fields in its metadata catalog which makes these unmappable for vendors with existing solutions. I understand the requirement and it does make sense, but this may lead to issues for integrating parties and should, in my opinion, be reconsidered.

The way I understand the intention behind the referanseAdministrativEnhet and referanseSaksbehandler values, they refer to the SystemID value of the Admin.AdministrativEnhet and Admin.Bruker entities and make it possible to look up the user in the API and be sure the correct user is looked up, independent of any user name changes done over time. Looking at the specification, it might make more sense to make it into a relation link to the correct Bruker instead.

Anyone else got an opinion or suggestions?

-- Happy hacking Petter Reinholdtsen

ivaylomitrev commented 1 year ago

Just a note that although this sounds convenient, it would require a new major version of the specification as it would require a change in the vendor implementations (i.e. it would be backwards-incompatible with regard to version 1.0).

petterreinholdtsen commented 1 year ago

[ivaylomitrev]

Just a note that although this sounds convenient, it would require a new major version of the specification as it would require a change in the vendor implementations (i.e. it would be backwards-incompatible with regard to version 1.0).

It might, depending on how it is done, yeah. If the relation is an optional convenience link, which can be in place in addition to the 1.0 required values, I suspect it can be in a minor update.

In any case, I suspect this require some more discussion to bring all consequences to the table. :)

The key part of such SystemID values all across the API and the Noark 5 extraction format, is that they should be stable across extractions to ensure duplicates and changes can be tracked across XML contents. Not sure how much it affect this particular value, as neither of the Noark 5 XSD schemas (checked both v5.0 and future v5.1) include these values. Perhaps it should be possible to include this value in XML in the future? Should perhaps ask for a change i Noark 5 to make it happen.

-- Happy hacking Petter Reinholdtsen

tsodring commented 1 year ago

It's an interesting point that has been raised and one that we will likely meet again. It is worth remembering that Noark 5 is a conceptual standard that can be used as a basis for a record-keeping system. Noark 5 is not a technical standard and the standard is explicit about that.

Noark 5 stiller krav til arkivstruktur, metadata og funksjonalitet, men ikke til teknisk implementering av kravene. Standarden definerer derfor ikke et system, eller en type system, men legger til rette for ulike løsninger. Målet er å unngå at standarden resulterer i universalløsninger som brukes på alle typer prosesser ... En Noark 5 kjerne er et konseptuelt begrep, som kan være en egen systemmodul (en arkivkjerne), men den behøver ikke være det. En «Noark 5-kjerne» er et sett av krav som skal eller bør oppfylles av en løsning (ett eller flere system) for å kunne godkjennes i henhold til Noark 5

I think this point is often missed. When going from a conceptual standard to a technical standard, choices will have to be made. This might be one of those times. The Noark 5 standard does not say anything about .well-known/openid-configuration. Neither does it specify system information

{
  "leverandoer": "Hoffleverandøren",
  "produkt": "Arkivsystemet Noark 5 kjerne",
  "versjon": "0.1",
  "versjonsdato": "2019-03-22",
  "protokollversjon": "1.0 Beta"
}

As such, I am left wondering how important a mapping from the Noark 5 API specification to an implementation of the Noark standard is. An implementation of the Noark 5 API specification can, and will, require additional work to ensure compliance with the Noark 5 API specification if a system is not built with accordance with the Noark 5 API specification to begin with.

ivaylomitrev commented 1 year ago

I believe that (one of ) the most common use cases for the implementation of the Noark 5 Web Services would be by vendors who have already acquired Noark compliance over the years. As a result, they will implement the Noark 5 Web Services on top of existing solutions that have been on the market for decades and have implemented countless other APIs on the way. Although it sounds reasonable to extend the specification with additional model information, this has to be taken into account as such new metadata might affect how much and how easily such vendors can adopt the new specification. The development of new archive cores, of course, would not be bound by such limitations.

The aforementioned endpoints are not mentioned in the standard, yes. As such, I also agree the API is free to define its own contract for them. However, when talking about the correspondence parties, they are part of the model defined in Noark 5, albeit conceptual. As such, in my opinion, there is a fine line between extending the API within reasonable limits and it being difficult (if not impossible) to implement by existing Noark 5-compliant systems due to the addition of (potentially unmappable) information not available in the Noark 5.5 data model. In other words, I would expect that the API defines means of communicating with Noark 5 compliant systems rather than defining a Noark 5 system in its own and imposing implementation details. As you mentioned, vendors are free to implement the conceptual standard and the more non-Noark concepts are present, the greater the chance one or another vendor might have issues adhering to them. If the goal of the specification is to define an API for the entire market, I would expect that it is easily adoptable by all kinds of systems - both existing and new, and I would expect that it represents a subset of the definitions in the standard.

I am in no way critiquing the current specification work and I value the work and the time spent on it. What you read above is my entirely subjective opinion on the matter as a developer reading the specification and mapping it to an existing archive core. I am just trying to present another point of view. Feel free to disregard my comment if you do not think it applies to the founding principles of the API specification.

tsodring commented 1 year ago

I understand the point you are making and will provide a point-of-view from the development of the API specification. The Noark standard is a conceptual standard that can be used to define many different recordkeeping systems. The standard itself says that the requirements are more like guidelines. However, when reading the standard, it feels like a technical standard.

I was pretty vocal about the importance of an API description from about 2010 and pushed for an API standard. My belief then was that it must be possible to define an API that works for this standard as everything has technical descriptions, including rich XSD descriptions. Eventually the spec started to take form, but it appeared that no vendor was really interested in helping in the development of the specification. I recall sitting at a meeting at Riksarkivet and hearing vendor after vendor tell how bad the project was.

Arkitektum had a proprietary model description in some MS Enterprise modelling tool, and the standard was mainly described in an MS Word document. Riksarkivet listened to Petter, and he got the standard over to puml and rst in order to support a more democratic and transparent development process. This worked really well, and we were able to create good processes to fix the standard, as can be seen by the commit history and related pull requests.

Still, no vendors were interested in helping or contributing to this process. There were discussions on what would be good approaches, expected client and server requirements for a Noark 5 core. Issues and pull requests were drafted yet no vendors joined in on the discussion. The time to influence the 1.0 version was when it was under development. If there were alternative approaches that needed to described, then they should have been described while the standard was being developed.

It should be noted as well that vendors often have both a server and client view when developing a system. The Noark 5 API specification is an attempt to create an API specification where the client is unknown

However, the editorial board would love to get input on a 1.1 version, so you are very welcome to join in further work on making the best API description possible for Noark 5 systems.

petterreinholdtsen commented 1 year ago

It seem like a good idea to bring Noark 5 in sync with Noark 5 Tjenestegrensesnitt when it come to these two fields, so I wrote up <URL: https://github.com/arkivverket/noark5-standard/issues/146 > to ask for it. It might raise the question to a wider audience.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen commented 1 year ago

This was discussed briefly during todays editorial meeting.

The premise in the question is that a digital archive system would only have the fields listed in Noark 5 available, and thus do not have information about who was handling cases and which department this person was part of. We are not sure this is a correct assumtion, as Noark 5 describe what should be extracted from a closed archive, while a open archive will need to keep track of more information to function. Is there really is an existing case handling and archiving system that do not keep track of who do the case handling while the cases is being handled or documents are being filed? If so we would love to hear how they keep track of access control.

Regarding the UUID identifiers for admin users and departments, one way to handle it if the existing Noark 5 archive system lack such identifiers is to derive UUIDs from the internal identifiers that most likely already exist. One way to do this is described in the IETF RFC dokumenting UUIDs, section 4.3, <URL: :https://www.rfc-editor.org/rfc/rfc4122#section-4.3 >. Using this approach it is possible to derive a stable UUID as long as there is some stable ID associated with the user or department. If there is a system without such stable user or department ID, please let us know which system this is.

ivaylomitrev commented 1 year ago

The discussion has digressed a little bit from the original topic, but I will respond to the comment about generating UUIDs below, because I have concerns about it.

I confess I have not used the algorithm described in section 4.3 of RFC4122, but I see it describes the usages of hash function making these UUIDs non-reversible.

Here are two scenarios to consider:

  1. Archival core generates a UUID from a non-UUID stable identifier and uses it in the referanse* fields. An integrating system is supposed to read these and find the user in an external user directory.
  2. An integrating system pushes data to an archival core and sets the referanse* fields by means of generating a UUID from a non-UUID stable identifier (assume submission of a historical archive, for example).
  3. An archival core automatically populates the referanse* fields based on the information in the user token with which the data was published, but the external user directory provides non-UUID stable identifiers so the archival core generates a UUID from them.

In all of the described cases the archiving system will store a UUID for the referanse* fields and will be compliant with the requirements of the specification. However, in all cases, user data will be obfuscated and its link to the external user directory will be irreversibly lost and neither the archival core, nor the integrating system, nor any other party will be able to identify which user exactly that is in the external user directory without using rainbow tables.

Due to the above I do not see why the limitation to use only UUIDs for such fields is present (there are other in the specification too). It sounds like this may present issues to data quality without bringing any benefits (apart from cosmetic and consistency ones).

tsodring commented 1 year ago

This issue was discussed in the editorial meeting today. We believe we have clarified what needs to be clarified. It is worth remembering that The Noark 5 standard is a conceptual standard that is open to interpretation, while the API spec is a technical standard, and in some ways can be seen as opinionated approach as an interpretation Noark 5 so that as many clients as possible can use Noark in a standardized manner, ensuring that public agencies in Norway can have reduced costs when developing integrations.

The issue of UUIDs were also discussed in https://github.com/arkivverket/noark5-tjenestegrensesnitt-standard/issues/26

If something is unclear or something needs to be added, we are open to include additional information in the 1.1 version of the API specification, but for now I think this issue can be closed.