CredentialEngine / CredentialRegistry

Repository for development of the Credential Registry
Apache License 2.0
12 stars 10 forks source link

Frame an approach for CER to handle profiles of the CTDL & CEASN #223

Closed stuartasutton closed 2 years ago

stuartasutton commented 5 years ago

Through the Navy CRADA, we are looking at specialized (profile) applications of CTDL and CEASN. Functionally, these profiles will be comprised of the two base languages as well as extensions and refinements to the base terms. Extensions and refinements of such profiles can take the following forms:

  1. Refinements through creation of new sub-properties and sub-classes for base language properties and classes;
  2. Addition of new properties and classes;
  3. Value constraints particular to the profiles (value vocabularies); and
  4. Profile-unique cardinality requirements.

Initially, this issue asks two key questions:

  1. Should CER ingest and otherwise handle data based on these profiles? and
  2. If so, should it handle a full ingest (validate against) the profiles? or
  3. Ingest but:
    • Ignore (discard) what it does not understand or what does not meet minimal data requirements; or
    • Ignore (discard) new properties and classes and "dumb-down" formally declared sub-classes or sub-properties (unknown to the CER) to their more general properties known to the CER (see no. 1 above under "profile additions include")?

One of the implications would be the need to handle multiple, profile-specific validation mechanism in addition to the current CE profile's validation constraints.

Lomilar commented 5 years ago

Formally, my vote is that this question should be deferred until there are signals that folks want the competencies in the CER.

That being said, this is a good use case. Here are some comments.

The last comment I said may require some explanation.

jeannekitchens commented 5 years ago

@stuartasutton @Lomilar @cwd-mparsons @siuc-nate there's multiple issues here and it should be reviewed irrespective of the CRADA. The issues include:

  1. The CTDL/CTDL ASN, by design, can and should allow for additional profiles. Credential Engine's schema management system and web pages on the technical site have to support the eventuality of having additional profiles that need to be managed by the Engine and therefore be visible on the tech site as terms etc.. These profiles may or may not require a CER profile. These profiles are irrespective of companion CER profiles.
  2. The CER needs to support more than one CER validation profile.
  3. Having more than one CER validation profile entails having assistant APIs for each CER profile.

The discussions we need to have need to be irrespective of the Navy. Other examples where these requirements come in to play are countries or regions that require a profile of the CTDL/CTDL ASN and use the registry to publish data to the Registry.

For this discussion the CE/CASS's role as a publishing and managing tool for the Registry has to be considered.

siuc-nate commented 5 years ago

@stuartasutton As long as the data meets our minimum data requirements, organizations can use as much of the CTDL as they want to require themselves to use. It should not be up to us to enforce an organization's schema upon that same organization; they should have enough discipline to publish to their own minimums. However, as we discussed in our call last week, it may be useful to enable them to indicate what their minimums are so that other organizations may publish to them (again, without our enforcement (beyond our own minimums)). But I don't know if that information belongs in the Registry - perhaps it should part of some other service.

@jkitchensSIUC We do have most of that:

The CTDL/CTDL ASN, by design, can and should allow for additional profiles. Credential Engine's schema management system and web pages on the technical site have to support the eventuality of having additional profiles that need to be managed by the Engine and therefore be visible on the tech site as terms etc.

The schema management system is designed to handle multiple schemas - currently we have CTDL, CTDL-ASN, and the Meta schema. The terms page, serializations page, mapping guidance page, and release history page are written generically to render whatever schema is thrown at them.

These profiles may or may not require a CER profile. These profiles are irrespective of companion CER profiles.

I'm not quite sure what the second half of that means, but the schema management system is also designed to handle multiple profiles for each schema - though currently we only have the registry profile for CTDL and the registry profile for CTDLASN.

The CER needs to support more than one CER validation profile.

The validation is maintained by the schema management system as part of the profile mechanism, so this is already present. Note: If by chance you are referring specifically to JSON schema, this is not being used since:

Having more than one CER validation profile entails having assistant APIs for each CER profile.

This would likely be the same API, but with different validation constraints applied.

At some point if/when time allows, I would like to document the validation/constraint mechanism I am currently using in the schema management system to create and manage profiles.

stuartasutton commented 5 years ago

@siuc-nate, you say that JSON Schema is not being used and the validation is being done by API. While OK for stuff being published via the API, what about those that want to independently validate their data against what the CER requires? Would we then be in ShEx / SHACL land?

stuartasutton commented 5 years ago

Such independent validation mechanism are what gives IMS specs a different level of power and acceptability because your data can be independently verified as CER compliant.

siuc-nate commented 5 years ago

It would most likely be something like that - I would of course need to come up with a means of converting what I've got now to output its constraints in those schemas. I think I mentioned this to you briefly on a previous call, but I had experimented with that at one point but didn't have time to fully work through it. Of particular difficulty were the same things that trip up the output of JSON schema - the various specific tweaks to certain classes and the conditional requirements (e.g. must provide ownedBy or offeredBy). It should still be possible though.

Lomilar commented 5 years ago

Exposing the validation component as a (freely usable) web service could help greatly here.

siuc-nate commented 5 years ago

Per our 1/15/2019 meeting:

A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

This definition includes what are often called "application profiles", "metadata application profiles", or "metadata profiles".

cwd-mparsons commented 5 years ago

@science @rsaksida The primary problem that we want to solve is to allow publishing different profiles for CER data. I think that the use of communities might fulfill this requirement. The data that is currently published to the registry is considered a registry profile, based on the classes and properties currently handled. I don't recall if there are limitations to communities in this regard. As noted earlier in this thread, there may be variations in the profiles, but would be based on the same base schema. I think that one of the requirements of communities is that the same schema has to be used for all communities (at least on the same server).

stuartasutton commented 5 years ago

A profile of would need to accommodate:

  1. Refinement of the profiled schema's existing properties and classes through creation of new subclasses and subproperties;
  2. Inclusion of newly coined classes and properties tailored to the profile community's needs (coined new or borrowed from a different namespace);
  3. New constraints on value spaces (i.e., different controlled vocabularies); and
  4. Other usual constraints (different cardinality, optionality, defined subset of properties and classes from the schema being profiled).
siuc-nate commented 5 years ago

Essentially we just need a schema that:

Then the profiles could be published and maintained like any other JSON data. Once we're able to publish CTDL, CTDL-ASN, and the registry profile of each to the registry itself, and get them back out successfully, we'll know we've got it right.

I should caution though that allowing different profiles means that, unless we do something to prevent it, a given profile might not require as much data as the registry profile requires (e.g. a credential in some other profile might only require the name, which would mean that that data is incomplete if we try to load it into our system, which could lead to a number of errors). It's also possible that a profile might (and probably will) require data that is not present in vanilla CTDL - meaning vanilla CTDL data would be incomplete (and thus lead to errors) in a system meant to consume data from that profile.

stuartasutton commented 5 years ago

Possibly helpful resources on emerging W3C and IETF specifications for profiles:

  1. Negotiating Profiles in HTTP;
  2. The Profiles Ontology; and
  3. Content Negotiation by Profile

Nice working definition (Content Negotiation by Profile):

A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

This definition includes what are often called "application profiles", "metadata application profiles", or "metadata profiles".

siuc-nate commented 5 years ago

Would there be any kind of restrictions placed on this? Once you enable profiling, you enable the addition, removal, and modification of classes and properties - effectively turning the credential registry into a schema registry. If we then allow data to be published to the registry using any schema that also exists in the registry, that effectively turns the registry into a big online generic database where records may or may not be compatible at all with CTDL or with systems that consume from the registry.

Is it enough that we'd only let certain organizations publish schemas/profiles, or would some other limits need to be imposed?

I go back to Stuart's initial question:

Should CER ingest and otherwise handle data based on these profiles?

Or would something like that be in the domain of a project with a much broader scope than credentials?

stuartasutton commented 5 years ago

@siuc-nate, just a point of clarification. A profile does not include modification of classes and properties; however, creating a subclass or subproperty of an existing class or property is not a modification.

cwd-mparsons commented 5 years ago

@stuartasutton Would all refinements/additions to a profile schema have to be part of the 'base' CTDL schema? If not, then I don't believe the community approach would work.

stuartasutton commented 5 years ago

No, bullet 2 in my comment above includes the coining of new terms which should likely be in their own namespace defining the profile. We clearly would not want the CTDL to grow just because a new profile has a term that is unique to the discourse of community of practice for which it is being defined.

stuartasutton commented 5 years ago

There are a number of policy issues (should we/shouldn't we) that should be set aside for later with the focus here on the technical issues of accommodating such profiles.

edgarf commented 2 years ago

We are closing this for now. It will be re-opened once the issue resurfaces.