CredentialEngine / Schema-Development

Development of the vocabularies for the CTI models
14 stars 8 forks source link

Data Design Issue #521

Closed siuc-nate closed 5 years ago

siuc-nate commented 6 years ago

Discussion of #508 led to uncovering deeper issues with our data design as it relates to JSON-LD, the Registry, CASS, signatures, etc. I will attempt to document this as clearly as possible. We need to align all of our systems to be able to handle the following:

Situation

CTDL

CTDL-ASN

Concept Schemes (CTDL-SKOS?)

Multiple Languages

JSON Validation

Credential Registry

CASS

Problems and Proposals

Currently, the Registry structure:

Currently, CASS:

So, we have a complex and interwoven web of issues where solutions to one will influence (if not outright determine/block) solutions to others. I am not sure of the best way to handle this short of proposing and walking through entire solution stack proposals - but maybe that would be worth doing?

I think this can all be handled with one model or set of rules for modeling data - but we all must be on the same page about that solution and how it impacts (or is impacted by) all of our more localized use cases/issues/etc.

Flagging down @stuartasutton @science @lomilar @cwd-mparsons to get their thoughts (though I have discussed this with Mike some internally).

siuc-nate commented 6 years ago

I suggest we tackle this step by step:

  1. Figure out the rules around language handling, specifically the stuff here: https://github.com/CredentialEngine/vocabularies/issues/514#issuecomment-371908719
  2. Get a stronger sense of how we will handle concept schemes, as these have not been discussed in any great detail
  3. Determine what kind of URI structure competency frameworks/competencies and concept schemes/concepts will have
  4. Determine the best structure for Registry payloads that handles cases where one top-level object is or is not desirable (e.g. CTDL object vs competency framework and competencies)
  5. Ensure that there are no problems with URIs, signatures, etc. as they relate to both the Registry and CASS
  6. Document/describe/visualize how the solutions we come up with will work across our systems
  7. Identify and update code and documentation accordingly

I am certainly open to other ideas for handling this, however.

siuc-nate commented 6 years ago

Recording progress so far:

The issue seems to settle into three major (and overlapping) categories:

The core of the solution to all three of these is to make the payload structure of data in the registry contain a @graph rather than just being the relevant top-level object. For instance, instead of:

{
  "envelopeID": "[GUID]",
  "decoded_payload": {
    "@context": { ... },
    "@type": "ceterms:Certificate",
    ... //Other properties
  }
}

You would use:

{
  "envelopeID": "[GUID]",
  "decoded_payload": {
    "@context": { ... },
    "@graph": [
      {
        "@type": "ceterms:Certificate",
        ...//Other properties
      }
    ]
  }
}

Note that the @context block is moved to the @graph level, and is no longer inside of the Credential.

What this enables:

So I think it largely solves the problems we have. However, it does require changes across our systems:

That's all I can think of off the top of my head. Feel free to comment/expand/etc.

siuc-nate commented 6 years ago

Can we try to come to a consensus on how best to move forward?

So far I believe the solution, part 1 is to make the Registry payload an object that contains a @graph that contains the rest:

{
  "envelope_id": "",
  "decoded_resource": {
    "@graph": [
      { ... }, //Top-level CTDL object
      { ... }, //bnode (aka "reference" or "pointer" object)
      { ... }, //bnode (aka "reference" or "pointer" object)
    ]
  }
}
{
  "envelope_id": "",
  "decoded_resource": {
    "@graph": [
      { ... }, //Competency Framework
      { ... }, //Competency
      { ... }, //Competency
    ]
  }
}

And the solution, part 2 is to use the @language property in the @context. But where does that @context go?

{
  "envelope_id": "",
  "decoded_resource": {
    "@context": { ... }, //@context at the @graph level
    "@graph": [
      { ... }, //Top-level CTDL object
      { ... }, //bnode (aka "reference" or "pointer" object)
      { ... }, //bnode (aka "reference" or "pointer" object)
    ]
  }
}
{
  "envelope_id": "",
  "decoded_resource": {
    "@graph": [
      {  //Top-level CTDL object
        "@context": { ... }, //@context at the top-level-in-the-@graph level
        ...
      },
      { ... }, //bnode (aka "reference" or "pointer" object - would these need a @context?)
      { ... }, //bnode (aka "reference" or "pointer" object - would these need a @context?)
    ]
  }
}

I think the answer is "both":

{
  "envelope_id": "",
  "decoded_resource": {
    "@context": { ... }, //@context at the @graph level, defines schema and default @language
    "@graph": [
      {  //Top-level CTDL object
        "@context": { ... }, //@context at the top-level-in-the-@graph level, exists only if necessary (e.g. to provide an alternate language - everything else is inherited from the @graph level)
        ...
      },
      { ... }, //bnode (aka "reference" or "pointer" object - @context inherited from @graph level)
      { ... }, //bnode (aka "reference" or "pointer" object - @context inherited from @graph level)
    ]
  }
}

This also enables a rare use case such as having a subset of competencies in a framework that have metadata in two languages (if the entire framework has two metadata languages, it might be better to publish a separate payload altogether with separate CTIDs and relate them with ceasn:exactAlignment?)

{
  "envelope_id": "",
  "decoded_resource": {
    "@context": { //@context at the @graph level, defines schema and default @language
      "@language": "en",
      ...
    },
    "@graph": [
      { ... }, //Competency Framework
      { ... }, //Competency with english metadata
      { //Same CTID/Competency as above, but with french metadata
        "@context": { "@language": "fr" },
        ...
      },
      { ... }, //Competency with english metadata
      { ... }, //Competency with english metadata
    ]
  }
}

I think this should solve the issues above. We need to decide, carefully and quickly.

Lomilar commented 6 years ago

My vote is still for langstrings.

https://json-ld.org/spec/latest/json-ld/#language-indexing

Not following the spec literally and in spirit would be a step towards incompatibility for users of the system.

Langstrings are in the JSON-LD spec. The break from JSON-friendliness already began when namespace scopes were included in the field names. Use of @graph is another break from JSON-friendly methods.

Using different CTIDs for different translations of the same degree would require another layer of alignment (sameAs?) that would be more expensive (and exist in fewer libraries) than langstrings.

Langstrings can be boiled out through use of JSON-LD processors and recontextualization, so if they want to go to https://credentialengineregistery.org/resources/?lang=en or use the header accept-language, those still are possible if langstrings are used, but not if the different language-records have different CTIDs.

Anyone using CTDL data (or that uses JSON-LD) will be used to adding getters to make langstring extraction natural.

stuartasutton commented 6 years ago

We should follow the JSON-LD spec both literally and in spirit...and that includes @graph.

On Wed, Mar 21, 2018 at 12:08 PM, Lomilar notifications@github.com wrote:

My vote is still for langstrings.

https://json-ld.org/spec/latest/json-ld/#language-indexing

Not following the spec literally and in spirit would be a step towards incompatibility for users of the system.

Langstrings are in the JSON-LD spec. The break from JSON-friendliness already began when namespace scopes were included in the field names. Use of @graph https://github.com/graph is another break from JSON-friendly methods.

Using different CTIDs for different translations of the same degree would require another layer of alignment (sameAs?) that would be more expensive (and exist in fewer libraries) than langstrings.

Langstrings can be boiled out through use of JSON-LD processors and recontextualization, so if they want to go to https:// credentialengineregistery.org/resources/?lang=en or use the header accept-language, those still are possible if langstrings are used, but not if the different language-records have different CTIDs.

Anyone using CTDL data (or that uses JSON-LD) will be used to adding getters to make langstring extraction natural.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CredentialEngine/vocabularies/issues/521#issuecomment-375061792, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzYpg-oc4UiXWcAuGyEYAm-0O93uY-xks5tgqVHgaJpZM4Sm7Ig .

-- Stuart A. Sutton, Metadata Consultant Associate Professor Emeritus, University of Washington Information School Email: stuartasutton@gmail.com Skype: sasutton

siuc-nate commented 6 years ago

Use of @language in the @context is in the spec: https://json-ld.org/spec/latest/json-ld/#string-internationalization

The currently-proposed solution has different language versions of metadata using the same CTID. See https://github.com/CredentialEngine/vocabularies/issues/514#issuecomment-372769710

Language maps are, in my opinion, also a burden to any publisher or consumer that is only interested in one language. They make examples harder to understand and documentation trickier to write.

Can you clarify what you mean about @graph being a break from JSON (did you mean a break from JSON-LD?) - I want to make sure I'm interpreting you correctly there.

Lomilar commented 6 years ago

You are absolutely correct that @language is in the spec, and is intended for indicating the default language of a non-multi-lingual object.

If the @graph of same-CTID objects only applies to the envelope, I don't think that's a problem because there's very little use for the envelope beyond book-keeping. Returning a @graph of objects if I navigate to https://cer/resources/<CTID> though is unacceptable (unless is referring to a statement set), because it requires me as a dumb developer and my dumb JS code to understand what was done and why it was done. Linked Open Data is, in part, about getting away from having to understand any particular system.

I'm also not sure what this pattern does for signatures. If the @id url returns a different resource than was signed by the envelope, how do I computationally verify the signatures?

--

@graph being a break from JSON

JSON-LD was intended to, among many things, serve as a bridge between JSON and RDF, allowing transforms from JSON objects to RDF (JSON-LD) and back with minimum fuss (via very complex @context).

This is, for instance, how we transform IMS CASE (which is just JSON) to CASS/Schema JSON-LD.

http://schema.cassproject.org/0.3/case2cass

and back

http://schema.cassproject.org/0.3/cass2case

Unfortunately, this capability is limited to fairly shallow mappings. Changes in structure, linking, use of statement sets (@graph), langstring transformations, etc (such as in example 39 in the spec) have varying success, but often require coding.

"@graph breaking from JSON" simply referred to using features of JSON-LD that distanced it from being plain-JSON compatible. I suppose the JSON version of an @graph would be an array of objects, or a result object with status and an array of results... but those transforms are not supported by current JSON-LD processors, I don't think.

siuc-nate commented 6 years ago

Thanks for the clarification on @graph and JSON/JSON-LD.

If I could expand for a moment on another part of my reasoning for using @language in the @context: Back when we were first looking at the problem, we liked that it kept the data simple, especially with regard to documents that only have a single language version, but the problem it presented was that you would need a @graph of top-level documents if you had more than one language (even if those documents had the same CTID). That wasn't enough of a justification at the time to make the switch to @graph, so we proceeded with language maps instead.

Some time thereafter, we determined that the best way to handle blank nodes would be through the use of a @graph at the root of the payload. That sort of started the ball rolling. More recently, we determined that in order to make publishing and consuming competency frameworks (and concept schemes) as easy as possible, we would also need to use a @graph of top-level objects.

Taken together, these were a much stronger justification for using a @graph as the root of the payload, especially since it opened the door to using @language-@context'd documents in a graph instead of the somewhat clunky language maps. It seemed like a way to kill three birds with one stone, hence my pushing for it at this point.

I don't deny that language maps have their place, but I think that since we're using a @graph anyway, we might as well use it to solve the language problem (on top of the other arguments).


Anyway, that aside, your points about the CTID/envelope/signature relationship are valid, but I think the pros outweigh the cons, in the end. Let me try to address your points individually:

Returning a @graph of objects if I navigate to https://cer/resources/<CTID> though is unacceptable (unless is referring to a statement set)

The way this is intended to work is something close to a statement set (or perhaps exactly that, depending on how you interpret the following):

Thus the @graph for a credential's CTID would contain data related to that credential, so this seems valid in my opinion.

This may be more in-line with your concerns, but the competencies would be "extra" data that could effectively be ignored. In most cases, I think they would be useful - and this would be the most efficient way to get all competencies for a given framework, as it requires no extra computation or retrieval on the part of either the consumer or the registry itself, since they are already present in the published payload document. Alternatively, we could figure out some means of indicating whether or not you want to receive the competencies too, or just the framework (though this would still be a @graph of objects in order to handle the languages).

This allows retrieval of a competency. For this case, the @graph is still needed to handle multiple language use cases. To validate signatures for a competency, retrieve the framework document it's a isPartOf and validate that. I don't know how often it would be necessary to validate signatures for competencies in the majority of cases, especially if the consumer is getting the data straight from the registry. This is a downside, admittedly, but I do think the benefits outweigh it.

I'm also not sure what this pattern does for signatures. If the @id url returns a different resource than was signed by the envelope, how do I computationally verify the signatures?

This should still work the way it does now - the payload that was published is the payload you get. The publisher will publish a @graph of resources as described above, and that's what would be signed, that's what would be returned when retrieved via @id. So nothing here should break. Or am I misinterpreting you?

Lomilar commented 6 years ago

Using @graph for CTDL-ASN CompetencyFrameworks makes sense because a CompetencyFramework really is a highly described statement set (were @stuartasutton redoing things, I expect he may have made CompetencyFramework extend StatementSet/@graph). Each child object is contextualized to only that CompetencyFramework (via the 2 object model and many discussions).

Concept schemes arguably have sharable concept nodes, but I don't mind either way, so treating concepts as being local to the ConceptScheme is fine as well, but they need to be individually identifiable within a scheme, so they all have an @id.

Those @ids should locate and return the concept. Similarly, the @id of a Competency should locate and return the Competency.

As far as access to the data goes, it tends to be beneficial if every object has a URL @id, because _bnodes are next to useless to web developers, because then I have to store the location of that _bnode as something like http://cer/resources/<ctid>#<bnodeId>.

And bnodes ids are randomly generated (yes?), so you can see how that breaks real fast upon the next version publish.

@id url returns a different resource than was signed by the envelope

Got it, I wasn't clear that the payload published is the payload you get, it seemed like that was becoming a malleable concept. (or it could be, at least)

siuc-nate commented 6 years ago

Yes, the intent would be that each concept be retrievable on its own, just as each competency should be.

Regarding bnodes: In terms of linked data and how it gets used, Stuart is probably better-equipped to respond than I am - but as a developer, I'm not sure where I would ever need to link directly to a bnode's data any more than I would ever need to link directly to the data for a ConditionProfile, or a ProcessProfile, or any of the other non-top-level classes in CTDL. What would a use case be for linking to a bnode directly? I admit I am somewhat biased due to my background in web/interface development.

In general, the bnodes thing came out of a conversation that happened because we are/were using both URIs and objects as values for the same properties, depending on whether or not there was a published, resolvable URI to reference. That was causing problems with the @context's definition of those properties, and we eventually arrived at bnodes as the currently-proposed solution.

Lomilar commented 6 years ago

When determining if an individual is qualified to enter a degree program:

Individual P meets ConditionProfile C, which is one of three ConditionProfiles in the degree program (one for local students, one for national students and one for international students).

How do I describe and save that statement?

stuartasutton commented 6 years ago

Bnodes are common fare in RDF and in JSON-LD. While nothing precludes assigning a URI to absolutely every resource in a description, we accept that bnodes have a utility within the scope of a graph--e.g., an instance of PostalAddress, ConditionProfile. After considerably going round and around, we accepted the inevitable need for bnodes for what you all are calling top-level entities when no URI is available--e.g., providing a brief description of an organization where there was no URI-named entity. Again, common fare.

But, those bnodes providing description of things like an organization were originally done without nodeID (i.e. no such thing as @id that resolved to _:12345678)--i.e., those "the infamous "reference/pointer" objects". In the end, that's what caused problems with the property declarations in the @context. So, we've added bnode nodeIDs.

So far, none of the above is unusual in RDF. It's also not unusual in JSON-LD. So, while I may be missing something, I don't see any reason for a ruckus around bnodes identified by nodeIDs and bounded by @graph.

stuartasutton commented 6 years ago

Nate, you state: "as a developer, I'm not sure where I would ever need to link directly to a bnode's data any more than I would ever need to link directly to the data for a ConditionProfile, or a ProcessProfile, or any of the other non-top-level classes in CTDL"

Here's an example. You have a credential that references a bnode describing an organization because the org has not been URI-identified--i.e., classic example of the infamous reference/pointer' object. Now, you have more than one property referencing this organization bnode--ownedBy, offeredBy, newedBy and revokedBy. That can be done either by repeating the bnode data four times, or by assigning it a nodeID, describing it once and referencing via the nodeID as object of the four properties.

Lomilar commented 6 years ago

@stuartasutton Nothing we're doing here is illegal or even unusual, but _bnodes are URIs, not URLs, and URIs may be identifiable (_bnodes are only identifiable within a context, yes?) but they are not locatable.

That makes them:

Even a PostalAddress should, in some distant future, have a URL (or maybe a descriptive URI that can be generated from the fields).

stuartasutton commented 6 years ago

As I said, some say that providing all instances of resources with URI is the way to go--no bnodes, none even with nodeIDs. That would have been doable and I would not have raised a peep. But, knowing what I know today, I would probably not advised it since it would have cascaded--all those resources would then need CTID, an envelop, a signature...and on down the rabbit hole.

By the way, in instances such as our "reference/pointer", being not "locatable" is a feature and not a bug.

Lomilar commented 6 years ago

By the way, in instances such as our "reference/pointer", being not "locatable" is a feature and not a bug.

100% onboard there.

Also, I should probably apologize as I argued as if we were doing this from scratch. The CTID/envelope/signature rabbit hole is understandable.

We have ours in CASS too. Versioning, what-fields-compose-the-signature-when-data-is-portable, and on.

siuc-nate commented 6 years ago

Edit: merged my responses into a single post:

When determining if an individual is qualified to enter a degree program:

Individual P meets ConditionProfile C, which is one of three ConditionProfiles in the degree program (one for local students, one for national students and one for international students).

How do I describe and save that statement?

Do you mean in the context of CTDL generally, or was this specifically in response to me asking for a use case where I might want to provide a URI directly to that condition profile (in my case, I wouldn't - the condition profile needs the context of its credential to make any sense, otherwise they are effectively "conditions for [undefined]")

Nate, you state: "as a developer, I'm not sure where I would ever need to link directly to a bnode's data any more than I would ever need to link directly to the data for a ConditionProfile, or a ProcessProfile, or any of the other non-top-level classes in CTDL"

Here's an example. You have a credential that references a bnode describing an organization because the org has not been URI-identified--i.e., classic example of the infamous reference/pointer' object. Now, you have more than one property referencing this organization bnode--ownedBy, offeredBy, newedBy and revokedBy. That can be done either by repeating the bnode data four times, or by assigning it a nodeID, describing it once and referencing via the nodeID as object of the four properties.

That statement was in the context of Fritz asking about linking to it directly from the outside as a standalone/top-level thing. Your example seems more relevant to the context of a @graph, where exactly that solution is already proposed.

As I said, some say that providing all instances of resources with URI is the way to go--no bnodes, none even with nodeIDs. That would have been doable and I would not have raised a peep. But, knowing what I know today, I would probably not advised it since it would have cascaded--all those resources would then need CTID, an envelop, a signature...and on down the rabbit hole.

This is correct, but it's only part of the picture - the main reason we had reference/pointer objects to begin with was to allow Entity A to describe/point to something owned by Entity B even if Entity B didn't publish data to the registry. We couldn't allow Entity A to "own" the data in the registry about something belonging to Entity B. Thus we needed something sufficiently descriptive/useful enough to point to/lightly describe Entity B's property while still clearly indicating that the data was not officially published by Entity B. Enter reference/pointer objects.

stuartasutton commented 6 years ago

Yes, Nate, that's the policy reason...

siuc-nate commented 6 years ago

To try to get this thread back on track a bit: Is there still objection to (or does anyone foresee problems with) the notions of:

It's fine if there's still problems with these; they're critical and we need to think them through - I just want to make sure we're not digressing too much.

stuartasutton commented 6 years ago

I can't address the "which is simpler to implement" or matters of implementing at the system level. So,

  1. Make the registry payload an object containing a @graph

    Yes

  2. Put the @context at the @graph level (and anywhere else that it's necessary, see #521 (comment) )

    Yes

  3. Use @language in the @context instead of language maps (with multiple copies of the relevant documents in each applicable language, sharing a CTID)

    Since there are several ways to do this in JSON-LD, I leave that to you and Fritz to hammer out so long as the solution does not make multi-language data difficult down the road.

  4. Store competency frameworks and competencies in the same graph

    Yes (if you mean named graph (@graph))

  5. Store concept schemes and concepts in the same graph

    Yes (if you mean named graph (@graph))

  6. Store credentials (or organizations, or assessments, or learning opportunities) and their associated bnodes in the same graph

    Yes (if you mean named graph (@graph); assuming bnodes here also includes reference/pointer' objects)

  7. Put the CTID of the main document (credential or competency framework or concept scheme, etc) at the @graph level (and probably inside the relevant documents too) - this may not be necessary, but it may make it more convenient to lookup the records (in general, but especially if the registry's search/retrieval software is already written to expect a CTID at the root payload level)

    I've not a clue since I don't know the system constrains on its use.

siuc-nate commented 6 years ago

@stuartasutton just to be sure - can you provide a JSON-LD example of a named graph? I believe we may be talking about the same thing.

stuartasutton commented 6 years ago

I think we are: https://json-ld.org/spec/latest/json-ld/#named-graphs (section discussing @graph)

siuc-nate commented 6 years ago

Hmm...I think there may be a problem with that. I'm glad you brought it up, as this is exactly the kind of thing we need to catch and handle now rather than later. Try this:

{
  "envelope_id": "ABC123",
  "decoded_payload": {
    "@context": [ 
      "http://credreg.net/ctdl/schema/context/json",
      {
        "@language": "en"
      }
    ],
    "@id": "http://credentialengineregistry.org/resources/[CTID#123],
    "@graph": [
      {
        "@type": "ceterms:Credential",
        "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
        ...//Other properties
      }
    ]
  }
}

There are two things I want to note here: First, use of an "advanced context" per the JSON-LD spec (scroll down to example 35 here: https://json-ld.org/spec/latest/json-ld/#advanced-context-usage ). This seems unnecessarily complex.

Note that I bring this up because:

Second, and more problematic, is the use of the same @id for both the @graph and the ceterms:Credential - this seems like invalid JSON-LD to me. Since the @graph is intended to express the data that would otherwise have to be encoded in the Credential document itself (i.e. the blank nodes), would it be correct to remove the @id from the Credential document altogether? Then everything in the graph would effectively become a blank node, which also seems like a bad idea.

Thoughts on how to solve this? Keep in mind that we want whoever retrieves the data by its CTID to retrieve the entire graph.

Lomilar commented 6 years ago

Do you mean in the context of CTDL generally, or was this specifically in response to me asking for a use case where I might want to provide a URI directly to that condition profile (in my case, I wouldn't - the condition profile needs the context of its credential to make any sense, otherwise they are effectively "conditions for [undefined]")

From a user or observer's standpoint, yes, but from a processor's standpoint, it doesn't need to know what the credential is to determine if someone is qualified for this condition profile. Even if it did in the course of processing, it doesn't need to keep that data. It just needs to store a statement that says that a student is qualified for this condition profile. If the credential is presented without _bnodes (just as nested objects) then it would have to use some sort of XPath/JSONPath to identify the child object in the credential, or in the case of _bnodes, its bnode ID. Both are more fragile and more complex than URLs.

We probably don't need to pursue this argument further for now.

is the use of the same @id for both the @graph and the ceterms:Credential - this seems like invalid JSON-LD to me.

A statement set is a thing with (optionally) its own @id. You are correct in saying this is a problem, as it could break caching systems (do I cache the @graph or the Credential in the @id's slot?) and all sorts of things.

I'd recommend using a @id-less graph, or setting the @id of the graph to something different. The graph is just being used to return a container of results, similar to a JSON array of results in traditional JSON APIs. It's common enough practice.

I would say the envelope it came in is another candidate for the @id of the graph... but I think the envelope is also not a named graph.

siuc-nate commented 6 years ago

I was looking through the JSON-LD spec again, and there may be another option that I think came up in some form much earlier in the thread (albeit without the JSON-LD spec notion behind it): use of @graph containers: https://json-ld.org/spec/latest/json-ld/#graph-containers - also, apparently the scope of bnodes is the document rather than the @graph ( https://json-ld.org/spec/latest/json-ld/#identifying-blank-nodes ), so the example below should be valid.

It would require addition of a meta property defined as "@container": "@graph", but "meta" is already in the @context, so then we could do something like:

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json?language=ru",
    "@type": "ceterms:Credential",
    "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
    "ceterms:ctid": "[CTID#123]",
    "ceterms:requires": [
      {
        "ceterms:targetAssessment": [
          "_:ABC", "_:DEF"
        ]
      }
    ],
    "meta:references": [
      {
        "@id": "_:ABC",
        ...//Other properties
      },
      {
        "@id": "_:DEF",
        ...//Other properties
      }
    ]
  }
}

But then we'd be back to the problem of using language maps (or requiring different-language versions of the data to be published as separate documents/envelopes/CTIDs altogether). I would instead lean towards using either an unnamed graph, or naming the graph but not the Credential...actually, consider this:

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json?language=en",
    "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
    "@graph": [
      {
        "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
        "@type": "ceterms:Credential"
      },
      {
        "@context": {
          "@language": "ru"
        },
        "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
        "@type": "ceterms:Credential"
      },
      {
        "@context": {
          "@language": "fr"
        },
        "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
        "@type": "ceterms:Credential"
      }
    ]
  }
}

We got hung up on making sure the CTIDs would be the same in the language issue (#514) and I think we forgot about the @ids therefore also being the same. Even if you didn't use a named graph here (no @id at the @graph level), you'd still be stuck with 1-n documents with the same @id.

Going back to my earlier post, retrieving a document from the registry by its CTID (and therefore its @id) would need to give you the entire @graph - which is perhaps instead an argument in favor of using a named graph and not assigning @ids to the top-level documents inside it? Or maybe you combine the two approaches?:

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json?language=en",
    "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
    "meta:rootDocuments": [
      {
        "@id": "_:Main1",
        "@type": "ceterms:Credential"
      },
      {
        "@context": {
          "@language": "ru"
        },
        "@id": "_:Main2",
        "@type": "ceterms:Credential"
      },
      {
        "@context": {
          "@language": "fr"
        },
        "@id": "_:Main3",
        "@type": "ceterms:Credential"
      }
    ],
    "meta:references": [
      {
        "@id": "_:ABC",
        ...//Other properties
      },
      {
        "@id": "_:DEF",
        ...//Other properties
      }
    ]
  }
}

But maybe that's overengineering it a bit too much.

Dang, it seemed like we were so close to solving this.

siuc-nate commented 6 years ago

Another approach (tell me if this sounds too crazy): Use index containers ( https://json-ld.org/spec/latest/json-ld/#data-indexing ) as a sort of "super" language map (blame the example in the spec itself for inspiring this one). For this example, assume the context includes "meta:text": { "@type": "@index" }

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json",
    "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
    "@type": "ceterms:Credential",
    "meta:text": {
      "en": {
        "ceterms:name": "My Credential",
        "ceterms:description": "Text describing this credential",
        ... // Other language-dependent properties
      },
      "ru": {
        "ceterms:name": "Мои полномочия",
        "ceterms:description": "Текст, описывающий эти учетные данные",
        ... // Other language-dependent properties
      }
    },
    "ceterms:requires": [
      {
        "@type": "ceterms:ConditionProfile",
        "meta:text": {
          "en": {
            "ceterms:name": "My conditions",
            "ceterms:description": "Descriptions for earning the credential",
            "ceterms:condition": [
              "Would text in a list",
              "also be logically assumed to be in the indicated language",
              "in a context like this?",
              "Or would each of these lines need its own {en} wrapper?"
            ]
          },
          "ru": {
            "ceterms:name": "Мои условия",
            "ceterms:description": "Описания для получения удостоверений",
            "ceterms:condition": [
              "Будет ли текст в списке",
              "также логически предполагается, что они указаны на указанном языке",
              "в таком контексте?",
              "Или каждая из этих строк нуждается в собственной оболочке {ru}?"
            ]
          },
        },
        "ceterms:yearsOfExperience": 9,
        "ceterms:targetAssessment": [
          "_:Assessment1",
          "_:Assessment2"
        ]
      }
    ],
    "meta:references": [
      {
        "@type": "ceterms:AssessmentProfile",
        "@id": "_:Assessment1",
        "meta:text": {
          "en": {
            "ceterms:name": "Some assessment we don't own",
            "ceterms:description": "Text describing it"
          },
          "ru": {
            "ceterms:name": "Некоторая оценка, которой мы не владеем",
            "ceterms:description": "Текст, описывающий это"
          }
        },
        "ceterms:subjectWebpage": "http://..."
      }
      {
        ... // Properties for _:Assessment2
      }
    ]
  }
}

This gives you the benefits of a language map in terms of being able to avoid duplicating all of the non-language-dependent data, avoids the @graph problem altogether (when paired with a meta property defined as "@container": "@graph" ( https://json-ld.org/spec/latest/json-ld/#graph-containers ), which could also be used to hold competencies, concepts, etc. in the context of frameworks and schemes), and solves the problem of where @id should live and what it should retrieve. It also avoids the complexity of doing language maps for every single property, as the @type of each language-dependent is still xsd:string, and the JSON itself is a lot easier to publish and read (in my opinion).

There's just one problem - according to the spec, @index is meant to be used with properties that are semantically ignored, meaning (if I'm interpreting it correctly) that even though the example in the spec itself uses it to create these sort of "super" language maps, that this wouldn't be a valid use of @index if your goal is to semantically provide such "super" language maps.

So close, yet so far. Maybe handling that in the definition of meta:text would work?

I'm surprised there's no way to semantically do language maps this way, given how cantankerous the vanilla language maps are - I assume this would be something like { "meta:text": { "@type": "@language" } }, but anything resembling that designation doesn't seem to show up anywhere in the schema. This proposal appears to have come up in the discussion of the spec that led to language maps as they are, and was shot down in favor of the approach we already explored (using a @graph of documents where each has its own @language in the @context) - so I'm not sure where that leaves us.

siuc-nate commented 6 years ago

As a sanity check, I did a language map version of the above example, and I guess it isn't all that different in terms of overall complexity (although the per-term usage of language maps still presents the difficulties I've outlined before):

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json",
    "@id": "http://credentialengineregistry.org/resources/[CTID#123]",
    "@type": "ceterms:Credential",
    "ceterms:name": {
      "en": "My Credential",
      "ru": "Мои полномочия"
    },
    "ceterms:description": {
      "en": "Text describing this credential",
      "ru": "Текст, описывающий эти учетные данные"
    },
    // Other language-dependent properties
    "ceterms:requires": [
      {
        "@type": "ceterms:ConditionProfile",
        "ceterms:name": {
          "en": "My conditions",
          "ru": "Мои условия"
        },
        "ceterms:description": {
          "en": "Descriptions for earning the credential",
          "ru": "Описания для получения удостоверений"
        },
        "ceterms:condition": [
          {
            "en": "Would text in a list",
            "ru": "Будет ли текст в списке",
          },
          {
            "en": "also be logically assumed to be in the indicated language",
            "ru": "также логически предполагается, что они указаны на указанном языке"
          },
          {
            "en": "in a context like this?",
            "ru": "в таком контексте?"
          },
          {
            "en": "Or would each of these lines need its own {en} wrapper?",
            "ru": "Или каждая из этих строк нуждается в собственной оболочке {ru}?"
          }
        ],
        "ceterms:yearsOfExperience": 9,
        "ceterms:targetAssessment": [
          "_:Assessment1",
          "_:Assessment2"
        ]
      }
    ],
    "meta:references": [
      {
        "@type": "ceterms:AssessmentProfile",
        "@id": "_:Assessment1",
        "ceterms:name": {
          "en": "Some assessment we don't own",
          "ru": "Текст, описывающий это"
        },
        "ceterms:description": {
          "en": "Text describing it",
          "ru": "Текст, описывающий это"
        },
        "ceterms:subjectWebpage": "http://..."
      },
      {
        ... // Properties for _:Assessment2
      }
    ]
  }
}

So maybe the approach in my post above isn't all that helpful - I'll leave it there for the sake of documentation nonetheless.

cwd-mparsons commented 6 years ago

@siuc-nate @stuartasutton @Lomilar @jkitchensSIUC Decision Request! There are many topics in this thread. The most pressing decision request relates to competencies. For my testing (via github), I had been using the current approach of separate schemas for the framework and the competency. I have not requested these to be updated in the registry sandbox, giving the uncertainty of the final approach. The API will have a specific endpoint for publishing from CASS, via the CTDL publisher. If we are going to go with the @graph approach, I will need to:

So, for competencies only, what is the decision:

siuc-nate commented 6 years ago

I came up with some fuller, more realistic examples. I was going to do some other ones (namely competency framework related ones) but I think these were the last nail in the coffin for the non-language-map approach. They revealed something that wasn't very obvious from the basic examples so far.

Specifically, while it is great if you only have one language:

Credential + 2 bnodes - one language
{
  "envelope_id": ".../123",
  "decoded_payload": {
    "@context": "https://credreg.net/ctdl/schema/context/json?language=en",
    "@id": "https://credentialengineregistry.org/resources/[CTID#123]",
    "@graph": [
      {
        "@id": "https://credentialengineregistry.org/resources/[CTID#123]/en",
        "@type": "ceterms:Certificate",
        "ceterms:name": "My Credential",
        "ceterms:description": "Description of this credential",
        "ceterms:subjectWebpage": "http://credreg.net",
        "ceterms:keyword": [
          "keyword 1", 
          "keyword 2", 
          "keyword 3"
        ],
        "ceterms:ownedBy": [
          "https://credentialengineregistry.org/resources/[CTID#456]"
        ],
        "ceterms:audienceLevelType": [
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "Beginner",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BeginnerLevel"
          },
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "Bachelors Degree Level",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BachelorsDegreeLevel"
          },
        ],
        "ceterms:requires": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "This describes the conditions",
            "ceterms:condition": [
              "condition one",
              "condition two"
            ],
            "ceterms:yearsOfExperience": 5,
            "ceterms:targetCompetency": [
              {
                "@type": "ceterms:CredentialAlignmentObject",
                "ceterms:targetNodeDescription": "Text of the competency",
                "ceterms:targetUrl": "https://credentialengineregistry/resources/[CTID#789]"
              }
            ],
            "ceterms:targetAssessment": [
              "_:AssessmentABC",
            ]
          }
        ],
        "ceterms:isAdvancedStandingFor": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "This credential is advanced standing for the other credential",
            "ceterms:targetCredential": [
              "_:CredentialABC"
            ]
          }
        ]
      },
      {
        "@id": "_:AssessmentABC",
        "@type": "ceterms:AssessmentProfile",
        "ceterms:name": "Name of the assessment",
        "ceterms:subjectWebpage": "http://somesite.org/abc"
      },
      {
        "@id": "_:CredentialABC",
        "@type": "ceterms:Certification",
        "ceterms:name": "Name of the credential",
        "ceterms:subjectWebpage": "http://someothersite.org/abc"
      }
    ]
  }
}

...It's overly verbose when more get involved, since a lot of the properties aren't language-dependent, resulting in more duplicate data than I thought there would be:

Credential + 2 bnodes - three languages
{
  "envelope_id": ".../123",
  "decoded_payload": {
    "@context": "https://credreg.net/ctdl/schema/context/json",
    "@id": "https://credentialengineregistry.org/resources/[CTID#123]",
    "@graph": [
      {
        "@context": {
          "@language": "en"
        },
        "@id": "https://credentialengineregistry.org/resources/[CTID#123]/en",
        "@type": "ceterms:Certificate",
        "ceterms:name": "My Credential",
        "ceterms:description": "Description of this credential",
        "ceterms:subjectWebpage": "http://credreg.net",
        "ceterms:keyword": [
          "keyword 1", 
          "keyword 2", 
          "keyword 3"
        ],
        "ceterms:ownedBy": [
          "https://credentialengineregistry.org/resources/[CTID#456]"
        ],
        "ceterms:audienceLevelType": [
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "Beginner",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BeginnerLevel"
          },
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "Bachelors Degree Level",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BachelorsDegreeLevel"
          },
        ],
        "ceterms:requires": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "This describes the conditions",
            "ceterms:condition": [
              "condition one",
              "condition two"
            ],
            "ceterms:yearsOfExperience": 5,
            "ceterms:targetCompetency": [
              {
                "@type": "ceterms:CredentialAlignmentObject",
                "ceterms:targetNodeDescription": "Text of the competency",
                "ceterms:targetUrl": "https://credentialengineregistry/resources/[CTID#789]"
              }
            ],
            "ceterms:targetAssessment": [
              "_:AssessmentABC",
            ]
          }
        ],
        "ceterms:isAdvancedStandingFor": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "This credential is advanced standing for the other credential",
            "ceterms:targetCredential": [
              "_:CredentialABC"
            ]
          }
        ]
      },
      {
        "@context": {
          "@language": "ru"
        },
        "@id": "https://credentialengineregistry.org/resources/[CTID#123]/ru",
        "@type": "ceterms:Certificate",
        "ceterms:name": "Мои полномочия",
        "ceterms:description": "Описание этих учетных данных",
        "ceterms:subjectWebpage": "http://credreg.net",
        "ceterms:keyword": [
          "ключевое слово 1", 
          "ключевое слово 2", 
          "ключевое слово 3"
        ],
        "ceterms:ownedBy": [
          "https://credentialengineregistry.org/resources/[CTID#456]"
        ],
        "ceterms:audienceLevelType": [
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "начинающий",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BeginnerLevel"
          },
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "степень бакалавра",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BachelorsDegreeLevel"
          },
        ],
        "ceterms:requires": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "это описывает условия",
            "ceterms:condition": [
              "условие один",
              "условие два"
            ],
            "ceterms:yearsOfExperience": 5,
            "ceterms:targetCompetency": [
              {
                "@type": "ceterms:CredentialAlignmentObject",
                "ceterms:targetNodeDescription": "текст компетенции",
                "ceterms:targetUrl": "https://credentialengineregistry/resources/[CTID#789]"
              }
            ],
            "ceterms:targetAssessment": [
              "_:AssessmentABC",
            ]
          }
        ],
        "ceterms:isAdvancedStandingFor": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "Эти полномочия расширены для других учетных данных",
            "ceterms:targetCredential": [
              "_:CredentialABC"
            ]
          }
        ]
      },
      {
        "@context": {
          "@language": "es"
        },
        "@id": "https://credentialengineregistry.org/resources/[CTID#123]/es",
        "@type": "ceterms:Certificate",
        "ceterms:name": "Mi credencial",
        "ceterms:description": "Descripción de esta credencial",
        "ceterms:subjectWebpage": "http://credreg.net",
        "ceterms:keyword": [
          "palabra clave 1", 
          "palabra clave 2", 
          "palabra clave 3"
        ],
        "ceterms:ownedBy": [
          "https://credentialengineregistry.org/resources/[CTID#456]"
        ],
        "ceterms:audienceLevelType": [
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "principiante",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BeginnerLevel"
          },
          {
            "@type": "ceterms:CredentialAlignmentObject",
            "ceterms:targetNodeName": "nivel de licenciatura",
            "ceterms:targetUrl": "https://credreg.net/ctdl/vocabs/audLevel/BachelorsDegreeLevel"
          },
        ],
        "ceterms:requires": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "esto describe las condiciones",
            "ceterms:condition": [
              "condición uno",
              "condición dos"
            ],
            "ceterms:yearsOfExperience": 5,
            "ceterms:targetCompetency": [
              {
                "@type": "ceterms:CredentialAlignmentObject",
                "ceterms:targetNodeDescription": "texto de la competencia",
                "ceterms:targetUrl": "https://credentialengineregistry/resources/[CTID#789]"
              }
            ],
            "ceterms:targetAssessment": [
              "_:AssessmentABC",
            ]
          }
        ],
        "ceterms:isAdvancedStandingFor": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:description": "esta credencial es avanzada para la otra credencial",
            "ceterms:targetCredential": [
              "_:CredentialABC"
            ]
          }
        ]
      },
      {
        "@context": {
          "@language": "en"
        },
        "@id": "_:AssessmentABC",
        "@type": "ceterms:AssessmentProfile",
        "ceterms:name": "Name of the assessment",
        "ceterms:subjectWebpage": "http://somesite.org/abc"
      },
      {
        "@context": {
          "@language": "en"
        }
        "@id": "_:CredentialABC",
        "@type": "ceterms:Certification",
        "ceterms:name": "Name of the credential",
        "ceterms:subjectWebpage": "http://someothersite.org/abc"
      },
      {
        "@context": {
          "@language": "ru"
        },
        "@id": "_:AssessmentABC",
        "@type": "ceterms:AssessmentProfile",
        "ceterms:name": "название оценки",
        "ceterms:subjectWebpage": "http://somesite.org/abc"
      },
      {
        "@context": {
          "@language": "ru"
        }
        "@id": "_:CredentialABC",
        "@type": "ceterms:Certification",
        "ceterms:name": "имя учетных данных",
        "ceterms:subjectWebpage": "http://someothersite.org/abc"
      },
      {
        "@context": {
          "@language": "es"
        },
        "@id": "_:AssessmentABC",
        "@type": "ceterms:AssessmentProfile",
        "ceterms:name": "nombre de la evaluación",
        "ceterms:subjectWebpage": "http://somesite.org/abc"
      },
      {
        "@context": {
          "@language": "es"
        }
        "@id": "_:CredentialABC",
        "@type": "ceterms:Certification",
        "ceterms:name": "nombre de la credencial",
        "ceterms:subjectWebpage": "http://someothersite.org/abc"
      }
    ]
  }
}

Based on our discussions and examples (including this last one), the fact that we were planning to implement language maps anyway, the fact that CASS already uses them, their use in the JSON-LD spec, and our very pressing need to move forward, I'm afraid I'll have to give in on this one and hope we can work out a good way to explain language maps to developers/partners after the fact. I am glad we at least explored other options, as I would have been left wondering "what if" otherwise. Thanks for all the thought-provoking feedback/pushback from the rest of you, as well.

Anyway, between that and all of us (I think) being on board with using @graph, it looks like the current solutions are:

Which leaves, unless I'm missing something, just one problem: Does a URI like https://credentialengineregistry.org/resources/ce-[UUID] belong at the @graph level, or in the (main) object inside the graph? @stuartasutton had suggested putting that URI inside the main object and using https://credentialengineregistry.org/graph/ce-[UUID] for the graph, and that may well solve it, but is there ever a case where you wouldn't want the other contents of the graph (the bnodes, and/or competencies)?

Consider also that we may want to reserve an endpoint like /graph/ for a service that either searches the registry in a graph-like fashion, and/or retrieves (on-demand by crawling the links) the entire description set for a given CTID rather than just the stuff that was directly published with it in its @graph.

Given that, with the use of language maps, there would only be one main document in the @graph, maybe it's acceptable to just not give the main document an @id at all? Or to give it one that hangs off of the @graph's URI, e.g.: https://credentialengineregistry.org/resources/ce-[UUID] for the @graph, and https://credentialengineregistry.org/resources/ce-[UUID]/top (or /main or /core or whatever) for the primary document?

Lomilar commented 6 years ago

The graph may be called a "named graph" but it doesn't have to have an @id. If you want to name it, @stuartasutton 's suggestion sounds good.

Coming at this from the outside I would expect, as a naive developer:

  1. When I go to any URL, that I get that resource back.
  2. If I get a graph back with the object flattened into a @graph, that's probably okay. It's like asking for a a bus and getting back a box of legos. Not a big deal, I can probably put it together or employ a JSON-LD processor to reassemble it. It's very similar to every other web service that returns a 'result object' with information that should be in the HTTP response header.
  3. The first thing I'm doing after processing the @graph is caching this data somewhere, which would be a 1-1 map of @id to {...}. So, everything in the @graph should have a unique ID with no duplicates. Again, I'm probably throwing away the @id of the @graph because it isn't important, just the data inside is important.

Also note:

That's all the opinions I got.

stuartasutton commented 6 years ago

Thanks, Nate. I know this has not been easy; but, coming, to this conclusion on your own is beneficial.

You state:

"Does a URI like https://credentialengineregistry.org/resources/ce-[UUID] belong at the @graph level, or in the (main) object inside the graph?"

Response: On the main object in the @graph.

"@stuartasutton had suggested putting that URI inside the main object and using https://credentialengineregistry.org/graph/ce-[UUID] for the graph"

Response: I am still of that opinion that if the @graph is to have a URI at all, it should be as I suggested. It's simple to explain and keeps our resources URI consistent. So, we'd have:

https://credentialengineregistry.org/graph/ce-[UUID] (resolve to content of the @graph--i.e., the description set).

https://credentialengineregistry.org/resource/ce-[UUID] (resolve to the content of the single object identified by the URI).

I'm not convinced that this use of /graph/ might have a better use. So in the end, we'd have:

https://credentialengineregistry.org/graph/ce-6d62b61a-033c-417a-9d53-ad930857465b https://credentialengineregistry.org/resource/ce-6d62b61a-033c-417a-9d53-ad930857465b

Simple to explain:

The /resource/ URI returns exactly what you are asking for with the URI. The /graph/ URI returns a description set of closely related entities (encompassed by the @graph).

Whether the @graph should be named by URI is not mandatory; but, it does buy you the functionality of being able to reference the full contents of the @graph as described above.

Lomilar commented 6 years ago

/graph/ returning the graph the object came in is good.

The /resource/ URI returns exactly what you are asking for with the URI.

It can't. It has to return a @graph (not necessarily the original graph) that contains the object requested along with any _bnodes that are referenced by that object. :-/

Otherwise, agreed.

siuc-nate commented 6 years ago

This seems like something that should be covered by the spec - unless you just mean that /resource/ should return the @graph and the URI of the main thing in the @graph should be something else (or blank)?

Lomilar commented 6 years ago

/resource/ cannot return just the object. It has to return a statement set (@graph) with at least the object and any bnodes within, because those bnodes aren't locatable.

So, /resource/ isn't returning exactly what you are asking for, it's returning a statement set with what you asked for inside.

siuc-nate commented 6 years ago

While we're at it, I should probably bring up:

If I get a graph back with the object flattened into a @graph, that's probably okay. It's like asking for a a bus and getting back a box of legos. Not a big deal, I can probably put it together or employ a JSON-LD processor to reassemble it. It's very similar to every other web service that returns a 'result object' with information that should be in the HTTP response header.

The way things are currently written, the only parts you'd need to assemble would be the bnodes that serve as references to top-level objects that don't exist in the registry. Everything else (the many ____Profiles) will be structurally a part of the main JSON document. If that's a problem, we need to solve it now. Otherwise, it should make the data easier to work with.

stuartasutton commented 6 years ago

I'm talking about what you'd get with RDF from a quadstore. To get back everything within the @graph, you'd need to resolve the 4rd member of the quad since it identifies the the full graph. See https://json-ld.org/spec/latest/json-ld/#named-graphs and check the accompanying data table. To get back everything within the bounds of the @graph you'd need to retrieve everything with a domain of _:graph (if it were a full URI and not a bnode). To retrieve everything describing Manu, you'd return those triples with a domain of http://manu.sporny.org/about#manu. To return all the information about Gregg, you'd retrieve those triples with a domain of http://greggkellogg.net/foaf#me. I think that comports with what I said.

Lomilar commented 6 years ago

Got it. Graph URL being a first order element in N-Quads and additional structure in JSON-LD is where I lost understanding.

What you said comports from the RDF side, less from the naive JSON/API consumer/developer side. I didn't know until yesterday what the difference between Triples and Quads was.

siuc-nate commented 6 years ago

To try to summarize where everyone is: Given this data:

{
  "decoded_payload": {
    "@context": "http://credreg.net/ctdl/schema/context/json",
    "@id": "https://credentialengineregistry.org/graph/ce-123",
    "@graph": [
      {
        "@type": "Credential",
        "@id": "https://credentialengineregistry.org/resources/ce-123",
        "ceterms:requires": [
          {
            "@type": "ceterms:ConditionProfile",
            "ceterms:targetAssessment": [
              "https://credentialengineregistry.org/resources/ce-890",
              "_:ABC",
              "_:DEF"
            ]
          }
        ]
      },
      {
        "@id": "_:ABC",
        "@type": "ceterms:AssessmentProfile"
      },
      {
        "@id": "_:DEF"
        "@type": "ceterms:AssessmentProfile"
      }
    ]
  }
}

Fill in the blanks:

Resolving https://credentialengineregistry.org/graph/ce-123 returns:

Resolving https://credentialengineregistry.org/resources/ce-123 returns:

The canonical @id of the credential therefore is:

stuartasutton commented 6 years ago

Nate, I am in transit to Toronto all day. Will try and get to this this evening or early tomorrow morning. If I have WiFi and time at SFO airport this am, I’ll respond then.

Sent from my iPhone

On Apr 3, 2018, at 2:51 PM, siuc-nate notifications@github.com wrote:

To try to summarize where everyone is: Fill in the blanks:

Resolving https://credentialengineregistry.org/graph/ce-123 returns:

Resolving https://credentialengineregistry.org/resources/ce-123 returns:

The canonical @id of the credential therefore is:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Lomilar commented 6 years ago

Resolving https://credentialengineregistry.org/graph/ce-123 returns:

The above, as is.

Resolving https://credentialengineregistry.org/resources/ce-123 returns:

{
"@context": "http://credreg.net/ctdl/schema/context/json",
"@id": "https://credentialengineregistry.org/resources/ce-123",
"@type": "ceterms:Credential",
"ceterms:requires": {
"@id": "_:b0",
"@type": "ceterms:ConditionProfile",
"ceterms:targetAssessment": [
"https://credentialengineregistry.org/resources/ce-890",
{
"@id": "_:b1",
"@type": "ceterms:AssessmentProfile"
},
{
"@id": "_:b2",
"@type": "ceterms:AssessmentProfile"
}
]
}
}

This was accomplished by first changing the context so it included targetAssessment @type:@id, then framing the @graph, fetching the resource out of the frame with @id:".../resources/ce-123" and compacting it.

Framing and Compaction are very typical JSON-LD processor transforms available in all sorts of libraries. They are powerful but finicky (like most RDF transforms and processors). Based on previous conversations, the context should probably be updated so that @id is transformed to id and @type is transformed to type. I'd prefer if all the objects in here had URLs for URIs, but that's probably not a big deal.

With just context changes and the application of the JSON-LD processor, the above can become:

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "id": "https://credentialengineregistry.org/resources/ce-123",
  "type": "ceterms:Credential",
  "ceterms:requires": {
    "id": "_:b0",
    "type": "ceterms:ConditionProfile",
    "ceterms:targetAssessment": [
      "https://credentialengineregistry.org/resources/ce-890",
      {
        "id": "_:b1",
        "type": "ceterms:AssessmentProfile"
      },
      {
        "id": "_:b2",
        "type": "ceterms:AssessmentProfile"
      }
    ]
  }
}

And if the default @vocab is set in the context to ceterms, it could become:

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "id": "https://credentialengineregistry.org/resources/ce-123",
  "type": "Credential",
  "requires": {
    "id": "_:b0",
    "type": "ConditionProfile",
    "targetAssessment": [
      "https://credentialengineregistry.org/resources/ce-890",
      {
        "id": "_:b1",
        "type": "AssessmentProfile"
      },
      {
        "id": "_:b2",
        "type": "AssessmentProfile"
      }
    ]
  }
}

which is as close to a JSON as I can make it.

I would personally go through and remove all the _:b* ids so that they don't confuse people, another step to making it easy to understand. It may not be necessary though.

P.S. This kind of stuff is how CASS translates from one schema to another.

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "id": "https://credentialengineregistry.org/resources/ce-123",
  "type": "Credential",
  "requires": {
    "type": "ConditionProfile",
    "targetAssessment": [
      "https://credentialengineregistry.org/resources/ce-890",
      {
        "type": "AssessmentProfile"
      },
      {
        "type": "AssessmentProfile"
      }
    ]
  }
}

The canonical @id of the credential therefore is:

https://credentialengineregistry.org/resources/ce-123

siuc-nate commented 6 years ago

Interesting, but it breaks 2 things:

public class ConditionProfile { public List TargetAssessment { get; set; } }

public class BlankNode { public string Type { get; set; } public string Id { get; set; } }

I can compensate for differences in property names via standard JSON libraries, but what I can't do is take data like:

[ "https://credentialengineregistry.org/resources/ce-890", { "id": ":b1", "type": "ceterms:AssessmentProfile" }, { "id": ":b2", "type": "ceterms:AssessmentProfile" } ]


and deserialize it into a `List<string>`, since the last two objects are not `string`s.  I would instead have to make the `TargetAssessment` property into a `List<dynamic>` or something along those lines, and loop through/parse every single one to figure out what to do with it - which in turn means I also have to maintain a second copy of my Credential class with a normalized definition of `TargetAssessment` that I can map everything to.  

In other words, I have to deserialize to a middle ground class, then inspect my way through it (recursively, in a real-world scenario), mapping to and populating a "real" class hierarchy that I can then work with elsewhere.  This overhead applies to every single property in CTDL that points to a top-level object that may or may not be published to the registry. it's also then doubled, since you have to accommodate going back the other way for publishing. 

You can start to see why I would _really_ like to avoid this headache (and why I wouldn't want to put other developers through it).  It also makes for a more brittle data structure that can't handle schema changes as easily.  I had a similar justification for language maps, but we're stuck with those.

Anyway, in the C# structure above, there would be some other class or property to accommodate the `@graph` (most likely just a class that extends `List<dynamic>`). 

There are JSON libraries that make working with this kind of problem a little easier, but ultimately you still have to deal with different data types in the same list and that just isn't something a strictly-typed language is going to do easily (I don't even like doing it in javascript, to be honest, since you have to check every value's type before you can use it - so I may be a bit biased).

Having said all of that, maybe the answer in this case would be for me as the developer to just take any instance of a `/resources/` URI I come across and switch it to a `/graph/` URI, retrieve that, and parse that instead.  Hm..
Lomilar commented 6 years ago

Preprocessing takes care of that most of the time. Having custom deserialization that deserializes from a string or json object by either storing the string in a common ancestor class and lazily fetching and deserializing the resource on access or fetching the resource and then deserializing that should deal with that complexity.

But this is an incredibly common problem among strongly typed languages, one that C# and Java wish they could be the leaders of so that everything could be strongly typed, but its JSON... Javascript Object Notation, so typeless is the name of the game. SOAP tried and failed, so we're now left with our square peg and round hole.

I tend to sidestep this in strongly typed languages by encouraging the use of libraries that already solve this problem... if they exist.

siuc-nate commented 6 years ago

Figure I may as well throw my two cents in on the original question while I'm at it:

Resolving https://credentialengineregistry.org/graph/ce-123 returns:

Everything inside the decoded_payload, verbatim. You asked for the graph by name, and that is what you get.

Resolving https://credentialengineregistry.org/resources/ce-123 returns:

Everything inside the decoded_payload, almost verbatim. In my opinion, the bnodes are part of the credential's data, since the credential's data is incomplete without them (the credential's data also references bnode IDs, which means you have a problem if you don't include the bnodes). Therefore you need a @graph wrapper around it, even if you think of it as a @graph that was generated on-demand and just coincidentally happens to be identical to the named @graph. The @context applied to everything in the original data, so it is correct to apply it to everything in the generated @graph, too. Therefore:

{
  "@context": "http://credreg.net/ctdl/schema/context/json",
  "@graph": [
    {
      "@type": "Credential",
      "@id": "https://credentialengineregistry.org/resources/ce-123",
      "ceterms:requires": [
        {
          "@type": "ceterms:ConditionProfile",
          "ceterms:targetAssessment": [
            "https://credentialengineregistry.org/resources/ce-890",
            "_:ABC",
            "_:DEF"
          ]
        }
      ]
    },
    {
      "@id": "_:ABC",
      "@type": "ceterms:AssessmentProfile"
    },
    {
      "@id": "_:DEF"
      "@type": "ceterms:AssessmentProfile"
    }
  ]
}

The only real difference is that the @id for the @graph was removed, because in this case, technically (or rather, semantically), the @graph was anonymously generated as a wrapper to handle the bnodes (it would still be generated even if there were no bnodes in order to ensure consistency in returned data). Whether this is actually what happens at the code level, or if the @id is just stripped out, is probably irrelevant.

Since the data is identical either way, it may be technically correct to include the @id for the @graph; I don't know. I don't care enough to argue about whether or not that should be included; it isn't worth holding up the rest of our implementation. As long as I get back one consistent format that doesn't require a bunch of edge case handling, I'm happy.

The canonical @id of the credential therefore is:

https://credentialengineregistry.org/resources/ce-123

At the end of the day I just kind of see them as two separate URIs for the same data. You could give me back the same document as a result of either URI and I'd be fine with it, but that's just my opinion.

Lomilar commented 6 years ago

At the end of the day I just kind of see them as two separate URIs for the same data. You could give me back the same document as a result of either URI and I'd be fine with it, but that's just my opinion.

Yup. Take what I gave and do a JSON-LD Flatten on it, and you get a graph back out. It should be the same graph. (with maybe different bnode IDs and a missing @id) That makes sense considering /resource/ and /graph/ are different web service invocations. The result of /graph/'s @id should invoke the /graph/ service and the result of the /resource/'s @id should invoke /resource/.

It's just a different shape for the same data (to the RDF aware).

siuc-nate commented 6 years ago

@Lomilar Agreed, the complexity can be dealt with and worked around, but rather than require everyone who publishes or consumes to implement complexity handling, I think it's better to just keep the data simple to begin with. Then nobody has to translate our schema into something they can understand; they can just work with it out of the box.

My sense is that if our data is so inconsistent or confusing that it always (or nearly always) needs to be heavily preprocessed, dramatically transformed, and/or run through a lengthy decision tree before it can be understood by anyone, then we have done something wrong.

Just my opinion, though.

Lomilar commented 6 years ago

The war of inconsistent flexibility vs consistent complexity is a tough one. I think we've found the fence that separates us.

Either way, high degrees of adoption require code libraries anyway to transfer whatever is done into the native paradigm of the language, so it may matter a little bit less anyway.

siuc-nate commented 6 years ago

I prefer consistent simplicity (with flexibility as more of an extension rather than a foundation), myself, but that is tough to come by sometimes - anyway, I digress. I look forward to Stuart's take on the question.

Other than that, I think we're all on the same page as far as the rest of it goes, so does anyone see a reason why we shouldn't move forward with our various implementations based on:

stuartasutton commented 6 years ago

Guys, these skeleton records don't cut it for me. See what you get with these nodeID blank nodes when you add more data than just the @id and @type--like add name etc. Don't stop with not getting any errors as jsonld! Run it through your tests with jsonld playground AND translate them into turtle and rdf/xml with something like the Good Relations translator (http://rdf-translator.appspot.com/) or easy rdf (http://www.easyrdf.org/converter) AND the W3C RDF validator with any resulting rdf/xml (if you get so far as translating json-ld to rdf/xml).

What happens to that additional bnode data when you look at it in jsonld playground?

Lomilar commented 6 years ago

I modified the context, so the above examples don't work as is.

The below validates using the easyrdf converter and the W3C RDF validator. The bnode ids are replaced when looking at the data in triples, which verifies my concern that bnode ids aren't respected (and are regenerated at will)

{
  "@context": {"actionStat":"http://purl.org/ctdl/vocabs/actionStat/","agentSector":"http://purl.org/ctdl/vocabs/agentSector/","asn":"http://purl.org/ASN/schema/core/","assessMethod":"http://purl.org/ctdl/vocabs/assessMethod/","assessUse":"http://purl.org/ctdl/vocabs/assessUse/","audience":"http://purl.org/ctdl/vocabs/audience/","audLevel":"http://purl.org/ctdl/vocabs/audLevel/","ceterms":"http://purl.org/ctdl/terms/","@vocab":"http://purl.org/ctdl/terms/","claimType":"http://purl.org/ctdl/vocabs/claimType/","costType":"http://purl.org/ctdl/vocabs/costType/","credentialStat":"http://purl.org/ctdl/vocabs/credentialStat/","creditUnit":"http://purl.org/ctdl/vocabs/creditUnit/","dc":"http://purl.org/dc/elements/1.1/","dct":"http://purl.org/dc/terms/","deliveryType":"http://purl.org/ctdl/vocabs/deliveryType/","foaf":"http://xmlns.com/foaf/0.1/","inputType":"http://purl.org/ctdl/vocabs/inputType/","learnMethod":"http://purl.org/ctdl/vocabs/learnMethod/","lrmi":"http://purl.org/dcx/lrmi-terms/","meta":"http://credreg.net/meta/terms/","obi":"https://w3id.org/openbadges#","orgType":"http://purl.org/ctdl/vocabs/orgType/","owl":"http://www.w3.org/2002/07/owl#","purpose":"http://purl.org/ctld/vocabs/purpose/","rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#","rdfs":"http://www.w3.org/2000/01/rdf-schema#","residency":"http://purl.org/ctdl/vocabs/residency/","schema":"http://schema.org/","score":"http://purl.org/ctdl/vocabs/score/","serviceType":"http://purl.org/ctdl/vocabs/serviceType/","skos":"http://www.w3.org/2004/02/skos/core#","vann":"http://purl.org/vocab/vann/","vs":"https://www.w3.org/2003/06/sw-vocab-status/ns","xsd":"http://www.w3.org/2001/XMLSchema#","ceterms:addressCountry":{"@container":"@language"},"ceterms:addressLocality":{"@container":"@language"},"ceterms:addressRegion":{"@container":"@language"},"ceterms:agentPurpose":{"@type":"@id"},"ceterms:agentPurposeDescription":{"@container":"@language"},"ceterms:alternateName":{"@container":"@language"},"ceterms:assessmentExample":{"@type":"@id"},"ceterms:assessmentExampleDescription":{"@container":"@language"},"ceterms:assessmentOutput":{"@container":"@language"},"ceterms:availabilityListing":{"@type":"@id"},"ceterms:availableOnlineAt":{"@type":"@id"},"ceterms:commonConditions":{"@type":"@id"},"ceterms:commonCosts":{"@type":"@id"},"ceterms:condition":{"@container":"@language"},"ceterms:contactOption":{"@container":"@language"},"ceterms:contactType":{"@container":"@language"},"ceterms:costDetails":{"@type":"@id"},"ceterms:creditHourType":{"@container":"@language"},"ceterms:creditUnitTypeDescription":{"@container":"@language"},"ceterms:deliveryTypeDescription":{"@container":"@language"},"ceterms:demographicInformation":{"@container":"@language"},"ceterms:description":{"@container":"@language"},"ceterms:evidenceOfAction":{"@type":"@id"},"ceterms:experience":{"@container":"@language"},"ceterms:externalResearch":{"@type":"@id"},"ceterms:familyName":{"@container":"@language"},"ceterms:framework":{"@type":"@id"},"ceterms:frameworkName":{"@container":"@language"},"ceterms:geoURI":{"@type":"@id"},"ceterms:givenName":{"@container":"@language"},"ceterms:hasConditionManifest":{"@type":"@id"},"ceterms:hasCostManifest":{"@type":"@id"},"ceterms:honorificSuffix":{"@container":"@language"},"ceterms:identifierType":{"@container":"@language"},"ceterms:image":{"@type":"@id"},"ceterms:isSimilarTo":{"@type":"@id"},"ceterms:keyword":{"@container":"@language"},"ceterms:missionAndGoalsStatement":{"@type":"@id"},"ceterms:missionAndGoalsStatementDescription":{"@container":"@language"},"ceterms:name":{"@container":"@language"},"ceterms:paymentPattern":{"@container":"@language"},"ceterms:processFrequency":{"@container":"@language"},"ceterms:processMethod":{"@type":"@id"},"ceterms:processMethodDescription":{"@container":"@language"},"ceterms:processStandards":{"@type":"@id"},"ceterms:processStandardsDescription":{"@container":"@language"},"ceterms:revocationCriteria":{"@type":"@id"},"ceterms:revocationCriteriaDescription":{"@container":"@language"},"ceterms:sameAs":{"@type":"@id"},"ceterms:scoringMethodDescription":{"@container":"@language"},"ceterms:scoringMethodExample":{"@type":"@id"},"ceterms:scoringMethodExampleDescription":{"@container":"@language"},"ceterms:socialMedia":{"@type":"@id"},"ceterms:source":{"@type":"@id"},"ceterms:streetAddress":{"@container":"@language"},"ceterms:subjectWebpage":{"@type":"@id"},"ceterms:submissionOf":{"@container":"@language"},"ceterms:targetNode":{"@type":"@id"},"ceterms:targetNodeDescription":{"@container":"@language"},"ceterms:targetNodeName":{"@container":"@language"},"ceterms:taskDetails":{"@type":"@id"},"ceterms:url":{"@type":"@id"},"ceterms:verificationDirectory":{"@type":"@id"},"ceterms:verificationMethodDescription":{"@container":"@language"},"ceterms:verificationService":{"@type":"@id"},"meta:domainFor":{"@type":"@id"},"meta:hasConcept":{"@type":"@id"},"meta:moreInformation":{"@type":"@id"},"meta:objectText":{"@container":"@language"},"meta:supersededBy":{"@type":"@id"},"meta:targetScheme":{"@type":"@id"},"rdfs:subclassOf":{"@type":"@id"},"owl:equivalentProperty":{"@type":"@id"},"owl:equivalentClass":{"@type":"@id"},"schema:domainIncludes":{"@type":"@id"},"schema:rangeIncludes":{"@type":"@id"},"owl:inverseOf":{"@type":"@id"},"skos:broader":{"@type":"@id"},"skos:narrower":{"@type":"@id"},"skos:inScheme":{"@type":"@id"},"vs:term_status":{"@type":"@id"},"skos:changeNote":{"@type":"@id"},"rdfs:label":{"@container":"@language"},"rdfs:comment":{"@container":"@language"},"dct:description":{"@container":"@language"},"vann:usageNote":{"@container":"@language"},"skos:prefLabel":{"@container":"@language"},"skos:definition":{"@container":"@language"},"id":{"@id":"@id"},"type":{"@id":"@type"},"Credential":{"@id":"ceterms:Credential"},"ConditionProfile":{"@id":"ceterms:Credential"},"targetAssessment":{"@id":"ceterms:targetAssessment","@type":"@id"}},
  "id": "https://credentialengineregistry.org/resources/ce-123",
  "type": "Credential",
  "name":"The credential",
  "requires": {
    "id":"_:bnode0",
    "type": "ConditionProfile",
    "targetAssessment": [
      "https://credentialengineregistry.org/resources/ce-890",
      {
        "id":"_:bnode1",
        "type": "AssessmentProfile",
        "name":"The first assessment profile"
      },
      {
        "id":"_:bnode2",
        "type": "AssessmentProfile",
        "name":"The second assessment profile"
      }
    ],
    "name":"The condition profile"
  }
}
siuc-nate commented 6 years ago

I've been discussing this implementation with @cwd-mparsons and we have a question:

Is there any reason not to include the ceterms:ctid at the @graph level? This should: