Data Design Issue - Githubissues

siuc-nate commented 6 years ago

Discussion of #508 led to uncovering deeper issues with our data design as it relates to JSON-LD, the Registry, CASS, signatures, etc. I will attempt to document this as clearly as possible. We need to align all of our systems to be able to handle the following:

Situation

CTDL

Relates to Organizations Credentials, Assessments, Learning Opportunities
Should retain 1:1 relationship between CTIDs and Envelopes (one top-level thing per envelope)?
Must enable blank nodes per #508

CTDL-ASN

Relates to Competency Frameworks and Competencies (see #522)
Must enable each framework and each competency to have a CTID and be resolvable as a top-level thing
- One envelope per thing? or
- One envelope per framework and each competency can be resolved (extracted from the payload and returned as a standalone object)? or
- Some extension of the /resources/CTID URI structure that issues the CTID only to the competency framework and hangs the rest off of that, e.g. /resources/[CTID]/[UUID]?

Concept Schemes (CTDL-SKOS?)

Relates to Concept Schemes and Concepts (see #522)
Must enable each concept scheme and concept to be resolvable as a top-level thing
May mimic the approach used for competencies, but it probably isn't necessary to issue a CTID to concepts(?)

Multiple Languages

Relates to all of the above
Must enable handling of multiple languages in the least difficult (for publishers and consumers) way possible per #514
Should probably use @language in the @context, having one language per JSON document (this will be the majority of cases) and requiring additional documents to be published for each additional language that describes a given thing
- Would such documents (describing the same semantic thing) each have their own CTID?
- Would such documents (describing the same semantic thing) each have their own Envelope?

JSON Validation

Relates to all of the above
Must enable JSON Schema Validation to support all of the above

Credential Registry

Must decide whether or not to keep the signatures, as this will impact how the above is handled
The above may relate to handling of related objects and named graphs
The above will impact how searches/queries/graphing are implemented
Will the registry house concept schemes/concepts?
- What will URIs for those look like?
- The same structure as everything else, e.g. /resources/[CTID]
  - Requires issuing CTIDs to concepts
- A schema-style structure, e.g. /vocabs/conceptName/concept
  - Requires a custom implementation different from everything else, which has implications for code and documentation
- Enable both?
  - Requires issuing CTIDs to concepts
  - Requires each concept to have its own envelope

CASS

Will likely be the storage place for concept schemes and concepts due to their structural similarity to frameworks and competencies
The above may relate to signatures

Problems and Proposals

Currently, the Registry structure:

Uses an envelope to contain data for a payload. That payload is a single CTDL object. In order to implement blank nodes, that payload will need to either:
- Become a @graph array where one node is the current payload, and subsequent nodes are the blank nodes
- This opens the door to publishing multiple top-level things in one envelope, which:
  - May be desirable in the case of publishing competency frameworks with their competencies
  - Breaks the usefulness of envelopes in relating to a specific CTID
  - Has implications for updating data (versioning)
  - Has implications for search/retrieval of data:
  - How to retrieve an envelope for a competency/concept if there isn't one? Would this ever be necessary?
  - How to search for things where things are in @graphs
  - Would bnode data be included in searches?
- Become an object that contains one "main" entity and an array of bnodes
- How would this be handled in terms of JSON-LD?
  - Does the scope of a bnode identifier allow looking for references outside of a @graph as long as they are somewhere in the JSON document? Is that valid JSON-LD?
- This may be confusing for publishing frameworks and concept schemes, as the temptation would be to publish the framework as the "main" entity and the competencies in the bnode array - however, each competency should be a "main" entity as well
  - Unless you go with the /resources/[CTID for framework]/[UUID for competency] approach noted above
  - For concepts, this may instead use a schema-like structure, i.e., /vocabs/[Concept Scheme URI]/[Concept URI], e.g. /vocabs/costType/TechnologyFee
Does not have handling for multiple languages (by design - we skipped doing this to focus on other things during the CE launch)

Currently, CASS:

Uses language maps, which are likely not the preferred approach (pending the results of #514) after all

So, we have a complex and interwoven web of issues where solutions to one will influence (if not outright determine/block) solutions to others. I am not sure of the best way to handle this short of proposing and walking through entire solution stack proposals - but maybe that would be worth doing?

I think this can all be handled with one model or set of rules for modeling data - but we all must be on the same page about that solution and how it impacts (or is impacted by) all of our more localized use cases/issues/etc.

Flagging down @stuartasutton @science @lomilar @cwd-mparsons to get their thoughts (though I have discussed this with Mike some internally).

stuartasutton commented 6 years ago

Must we, really? If it is needed, can't we simply assign a CTID-based URI and the local system software grab it from the URI if it needs it for local machinations?

On Thu, Apr 5, 2018 at 4:15 PM, siuc-nate notifications@github.com wrote:

I've been discussing this implementation with @cwd-mparsons https://github.com/cwd-mparsons and we have a question:

Is there any reason not to include the ceterms:ctid at the @graph level? This should:

Make it possible to use existing Registry software to retrieve records by CTID (since it will be at the root level of the payload)

Hopefully help work around the URI issue we're wrestling with

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CredentialEngine/vocabularies/issues/521#issuecomment-379062774, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzYpgpVZBQ70ZB4BMNyEPUjKHo8pc7Jks5tlnt7gaJpZM4Sm7Ig .

-- Stuart A. Sutton, Metadata Consultant Associate Professor Emeritus, University of Washington Information School Email: stuartasutton@gmail.com Skype: sasutton

siuc-nate commented 6 years ago

Correct me if I'm wrong, @cwd-mparsons , but I think the need for that hinges on whether or not the Registry depends on the CTID existing in the root level of the payload.

siuc-nate commented 6 years ago

@cwd-mparsons @stuartasutton @Lomilar I have updated the credreg.net site to:

Add the missing { "@type": "@id" } designations for anything that points to a top-level class
Remove references in the JSON schema to the use of pointer/reference objects
Clean up various things on the back end now that I don't have to hack pointer/reference objects into the code

Please verify that these context files now contain everything they should: http://credreg.net/ctdl/schema/context/json http://credreg.net/ctdlasn/schema/context/json

Note: The current code that generates the JSON Schema Validation documents has not been updated to reflect the use of @graph, because:

It likely isn't necessary right now, because all publishing flows through the API/publisher, which has its own validation layer
No one (as far as we know) is or should be using it as the basis for their publishing (we don't currently have a way to support partners who wish to publish via raw CTDL directly - the assistant API's format is the only supported way unless @cwd-mparsons knows something I don't)
None of the documentation has been updated yet to reflect the use of @graph or language maps
I have other priorities

cwd-mparsons commented 6 years ago

@stuartasutton @siuc-nate @Lomilar I have tested:

Publishing just the graph - no external Ctid, and no type
Publishing with a Ctid at same level as the graph, and a type

For option 1, the document can only be retrieved by envelope, and not CTID https://sandbox.credentialengineregistry.org/envelopes/b86629e9-87e5-4db0-ac43-38a85d321b79 Also, will not be found by a search by competency_framework https://sandbox.credentialengineregistry.org/ce-registry/search?resource_type=competency_framework

For option 2, the document can be retrieved by ctid, and can be found in the search. https://sandbox.credentialengineregistry.org/resources/ce-e15c3347-e7cd-367d-9408-0ae9a595e4fb

I had encountered a strange error where if the Ctid at the graph level is different than that for the competency framework, I get: Not enough or too many segments The latter seems a very strange error, but could be related to something in the registry. I have been testing publishing with validation turned off, so the error should not be schema related.

We have asked the registry team to investigate the implications of the Ctid only being inside the graph.

I think that for the specific case of publishing to the registry, we should include a CTID (the same as that for the competency framework) at the same level as the graph, along with the type.

siuc-nate commented 6 years ago

@stuartasutton @Lomilar @cwd-mparsons Where are we at on this? @stuartasutton did you get a chance to look at the context file changes in my above comment?

We met with the Credential Registry team a little while ago - they should be able to handle the changes but are discussing things internally (last I heard).

Lomilar commented 6 years ago

I haven't taken any action and don't have much of an opinion, since CTID is an internal identifier.

I believe that I only object to the @id of the graph being the same as the @id of the framework.

stuartasutton commented 6 years ago

I'm with @Lomilar that the @id for the graph being a unique URI of the form https://credentialengineregistry.org/graph/[UUID] and NOT being the same as the "top-level" entity in graph (however, not wed to the CTID form with this URI). I'll leave whether the graph itself should also have a CTID up to you guys. I've already complained too much about the CTIDs.

siuc-nate commented 6 years ago

I would prefer to have a URI for the graph that ends in the same CTID as the "main" resource in the graph itself, so that it's easy to figure out one URI or the other if you know the CTID. That would simplify documentation, implementation, and allow for advice along the lines of "To get all of the relevant data for this resource, use the /graph/ endpoint with the resource's CTID" (with a bit of additional explanation that it needs to be the CTID of the framework for competency framework graphs).

science commented 6 years ago

This may seem like an impossibly basic or ignorant question (forgiveness in advance, requested).

Are we talking about publishing envelope changes or resultset data changes? I see some mention of both above. That is, are we returning \@ graph structures as results, or are we allowing orgs to publish \@ graph statements?

If the latter is considered (and I think it is), I'm a little worried about republishing the same entity again and again - for example publishing multiple credentials with the same competencies would result in the same competency published multiple times? (Or consider the same question with organizations and credentials).

Am I missing a critical part of this conversation? Thanks for any education and enlightenment. (I tried to escape \@ graph so it wouldn't hassle the \@ graph user but that apparently failed - sorry graph, but I'd guess they're used to it)

siuc-nate commented 6 years ago

Per @stuartasutton, to summarize so far:

We will implement language maps as originally planned
We will implement @graph at the root of the decoded_payload
We will implement blank nodes in the @graph
We will implement Competency Frameworks and Competencies in the same @graph
We will implement an @id for the @graph using a URI that has /graph/ instead of /resources/ (see below)
The /graph/ URI will share the same CTID as the "primary" resource within the graph, e.g. https://credentialengineregistry.org/graph/ce-b69aa3a7-3f58-442f-9539-291ea29cc958 and https://credentialengineregistry.org/resources/ce-b69aa3a7-3f58-442f-9539-291ea29cc958 (see below)
We will encourage using the /graph/ URI as opposed to the /resources/ URI
Retrieving something via its /resources/ URI will return just that resource (and associated @context), even if the resource references other nodes (even blank nodes)

Example Source Data:

{
  "envelope_id": "04ca4351-47d8-4bc5-ad2e-11704ee99277",
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json",
    "@id": "https://credentialengineregistry.org/graph/ce-b69aa3a7-3f58-442f-9539-291ea29cc958",
    "@graph": [
      {
        "@id": "https://credentialengineregistry.org/resources/ce-b69aa3a7-3f58-442f-9539-291ea29cc958"
        "@type": "ceterms:Certification",
        "ceterms:name": {
          "en-US": "My Credential Name"
        },
        "ceterms:requires": [
          {
            "ceterms:targetAssessment": [
              "https://credentialengineregistry.org/graph/ce-317bbd77-4375-4434-bcf4-1effc3398ed6",
              "_:bfb140c3-8b62-4d9a-a2f4-c2ce8cf65054"
            ]
          }
        ]
      },
      {
        "@id": "_:bfb140c3-8b62-4d9a-a2f4-c2ce8cf65054",
        "ceterms:name": {
          "en-US": "My referenced assessment"
        }
      }
    ]
  }
}

If you resolve https://credentialengineregistry.org/graph/ce-b69aa3a7-3f58-442f-9539-291ea29cc958:

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "@id": "https://credentialengineregistry.org/graph/ce-b69aa3a7-3f58-442f-9539-291ea29cc958",
  "@graph": [
    {
      "@id": "https://credentialengineregistry.org/resources/ce-b69aa3a7-3f58-442f-9539-291ea29cc958"
      "@type": "ceterms:Certification",
      "ceterms:name": {
        "en-US": "My Credential Name"
      },
      "ceterms:requires": [
        {
          "ceterms:targetAssessment": [
            "https://credentialengineregistry.org/graph/ce-317bbd77-4375-4434-bcf4-1effc3398ed6",
            "_:bfb140c3-8b62-4d9a-a2f4-c2ce8cf65054"
          ]
        }
      ]
    },
    {
      "@id": "_:bfb140c3-8b62-4d9a-a2f4-c2ce8cf65054",
      "ceterms:name": {
        "en-US": "My referenced assessment"
      }
    }
  ]
}

If you resolve https://credentialengineregistry.org/resources/ce-b69aa3a7-3f58-442f-9539-291ea29cc958:

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "@id": "https://credentialengineregistry.org/resources/ce-b69aa3a7-3f58-442f-9539-291ea29cc958"
  "@type": "ceterms:Certification",
  "ceterms:name": {
    "en-US": "My Credential Name"
  },
  "ceterms:requires": [
    {
      "ceterms:targetAssessment": [
        "https://credentialengineregistry.org/graph/ce-317bbd77-4375-4434-bcf4-1effc3398ed6",
        "_:bfb140c3-8b62-4d9a-a2f4-c2ce8cf65054"
      ]
    }
  ]
}

Lomilar commented 6 years ago

Thank you Nate for this summary. I saw this request and my brain broke trying to remember everything.

+1

siuc-nate commented 6 years ago

To summarize: This issue is basically solved, but we're keeping it open for now as a reference for implementation.

jeannekitchens commented 6 years ago

@stuartasutton @Lomilar @siuc-nate @cwd-mparsons @science we need to meet in June and finalize the data design and put a deadline on the related work.

siuc-nate commented 6 years ago

I have created a google document to describe the implementation details: https://docs.google.com/document/d/1rCEEMD4eKPpVPANsz_zOOPQ70otJzuFEecMUtzynHrc

jeff-grann commented 6 years ago

This Digital Competence framework for citizens (DigComp) could be used to illustrate how multiple languages are supported.

siuc-nate commented 5 years ago

Per our 4-9-2019 meeting: Closing this issue (finally!) as it has been implemented across our system.

CredentialEngine / Schema-Development

Data Design Issue #521

Situation

CTDL

CTDL-ASN

Concept Schemes (CTDL-SKOS?)

Multiple Languages

JSON Validation

Credential Registry

CASS

Problems and Proposals