@context problem - Githubissues

siuc-nate commented 5 years ago

@stuartasutton @Lomilar @cwd-mparsons @rsaksida I have twice run into an issue when trying to use the CTDL @context to parse documents: There is no way (currently) to cleanly and reliably tell which properties are supposed to be URLs (e.g. ceterms:subjectWebpage) and which properties are supposed to be URIs that point to things (e.g. ceterms:ownedBy).

Currently the @context uses { "@type": "@id" } for both cases – I can't find anything obvious that seems to indicate whether or not this is truly correct. Should the @context instead use { "@type": "xsd:anyURI" } for URL properties? Would this break anything? Or would it make the @context clearer?

I know that @Lomilar and @rsaksida both use the @context in their code and @stuartasutton uses it for various other things so I wanted to get their feedback before making this kind of change.

Alternatively, is there a way to append some other property to the @context that indicates which properties are which while leaving the { "@type": "@id" } designation as-is?

Lomilar commented 5 years ago

I'm like, 60% confident in this answer.

xsd:anyURI would work for URL properties probably to distinguish them... though I've never seen that practice before. I'm not sure many people would understand it.

The cleanest would be to have a ~~schema:CreativeWork~~ schema:WebSite with sameAs defined as a "URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website."

Then, the leap to a web page would be explicit.

That being said, { "@type": "@id" } is correct for both,

In the linked data world.... and you'll have to forgive me for this assertion but I believe it to be true: Content negotiation is expected, or a web page can conceivably be expected to have enough linked data in it to describe itself (as headers, microdata, or otherwise)... and consuming applications should be able to extract that linked data or, if it can't extract linked data from the webpage, and it gets HTML back, then it knows its a web site.

The web is kinda messy in this sense, in that data for humans and data for machines is intermixed.

siuc-nate commented 5 years ago

What about coming up with some arbitrary pair of types that are effectively subclasses of @id? Something like this (though I'm not proposing these specific type names, of course):

{
  "@context": {
    "ceterms:subjectWebpage": { "@type": "meta:vanillaURL" },
    "ceterms:offeredBy": { "@type": "meta:objectReference" },
    "meta:vanillaURL": { "@type": "@id" },
    "meta:objectReference": { "@type": "@id" }
  }
}

This assumes there are some sort of parsing/processing rules/norms around types that essentially inherit from (or extend? I'm not sure what analogy really applies here) other types. And that's what I'm not sure of.

Assuming this isn't where my entire idea breaks down, though, it would (I think?) allow a consuming system that cares about which types of @id are which to process them accordingly, and everything else to just process them as normal @ids.

As best I can tell, @type must be a string rather than an array, so something like { "@type": [ "@id", "xsd:anyURI" ] } is out.

The only other thing I can think of would be to add some arbitrary properties to the context itself, which the JSON-LD spec folks don't seem very keen on allowing. Something like:

{
  "@context": {
    "schema": "http://schema.org/",
    "ceterms:subjectWebpage": { 
      "@type": "@id",
      "schema:additionalType": "xsd:anyURI" 
    },
    "ceterms:offeredBy": { 
      "@type": "@id",
      "schema:additionalType": "schema:Thing (or maybe ceterms:Agent, but that's beyond the scope here)"
    }
  }
}

Though as I think about it more, I suppose that if a given property is only ever supposed to be interpreted as a URL rather than a URI, it might(?) be semantically correct to refer to such properties as { "@type": "xsd:anyURI" } after all?

@stuartasutton @Lomilar thoughts?

Lomilar commented 5 years ago

There are indeed processing rules around these. If we look at:

{
  "@context":{
    "ns":"http://my.schema/",
    "xsd":"http://my.schema/",
    "ns:foo":{
      "@type":"@id"
    },
    "ns:bar":{
      "@type":"xsd:anyURI"
    }
  },
  "@id":"http://some.thing/1",
  "ns:foo":"http://a.link/",
  "ns:bar":"http://a.link/"
}

and expand it, we get:

[
  {
    "@id": "http://some.thing/1",
    "http://my.schema/bar": [
      {
        "@type": "http://my.schema/anyURI",
        "@value": "http://a.link/"
      }
    ],
    "http://my.schema/foo": [
      {
        "@id": "http://a.link/"
      }
    ]
  }
]

So systems will treat these differently at some point in the future.

siuc-nate commented 5 years ago

I knew about expansion, but I wasn't sure whether the notion of inheritance/extension/multi-typing was a thing or not (as shown in the first example of my previous post). E.g., property A is Type B and Type B is (also?) Type C.

Lomilar commented 5 years ago

Right, I'm not sure if @id has a type, as it's elemental to RDF. Not sure if RDF type theory can say that something is a subclass of whatever @id is in JSON-LD. Maybe a question for sporny or some other JSON-LD expert.

siuc-nate commented 5 years ago

In the case above, it'd be the other way around - the custom type would, in turn, have a type (which would be @id) which may(?) get around that fundamental nature. But yes, this is probably a question for someone like that.

jeannekitchens commented 5 years ago

@siuc-nate is this a registry issue or a vocabulary issue and what is the stutus for resoliving?

siuc-nate commented 5 years ago

This is a question of how we should indicate, via the context, which properties in the schema are URI references (to objects) and which ones should be explicitly handled as URLs (to web pages) - in other words, how to use the context to tell a consuming system "fetch the value of property A as JSON-LD and render it, but render the value of property B as a link someone can click on". The problem is that both types of URIs currently get assigned "@type": "@id" in the context, per the JSON-LD spec (which makes no distinction between pointers to "things" and pointers to webpages, which are technically also "things").

I suspect it may be as simple as changing the URL properties to have "@type": "xsd:anyURI", but I don't want to do that without getting @stuartasutton and @Lomilar's input.

stuartasutton commented 5 years ago

Nate, it might mean we have to experiment. In RDF, the the give me data /give me html is a function of content negotiation (if I am understanding this correctly). How that translates to JSON-LD is beyond me.

siuc-nate commented 5 years ago

For objects identified by (resolvable) URIs, yes - but for data that is semantically a URL (e.g., subjectWebpage), where there is no intent or implication of "there is a thing on the other end of this, go get it", I think it makes sense to treat them like URLs (URL literals?). If that makes sense.

stuartasutton commented 5 years ago

Nate, in terms of HTTP underlying RDF, there is always "something" on the other end (i.e., no 404s) and it's only a difference between the nature of what's returned (html content, some other content type, or code) and its treatment.

siuc-nate commented 5 years ago

Yes, for objects, but that's not what we're trying to express. We're trying to express "this string is a URL (literal), treat it like a URL (literal)." In thinking about it, xsd:anyURI may be incorrect as well (as we do use it in other places to imply that there is an object to be retrieved).

What I'm looking for, I think, is a @type that is essentially a subclass of xsd:string and which indicates a URL-formatted string that complies with whatever specification is appropriate for URL-formatted strings.

Alternatively (and perhaps preferably), some way to enhance the @context to indicate that such properties may be @type: @id but they should still be treated as URL literals. Something that says "this points to a webpage, don't bother resolving it like you would for properties known to point to data".

Perhaps that could be accomplished by making it possible to resolve the non-normative "groups" we show on the terms page of the credreg site? That may be worth exploring.

stuartasutton commented 5 years ago

If it is to be absolutely not to be resolved, literally treat it as a literal: i.e., rdfs:Literal or xsd:string.

philbarker commented 5 years ago

Is "@type" : "schema:URL" an option?

https://schema.org/URL DataType > Text > URL Data type: URL.

Lomilar commented 5 years ago

Putting the impetus on the data consumer to do an HTTP request (HEAD, probably) and go through the content negotiation step and handle the thing on the other side still seems like the right idea to me here. Distinguishing between a URL and a URL that people are supposed to understand doesn't seem like something that should be prescribed at the schema level. At the use case level, sure, but there are mechanisms for that (HTTP Accept headers, HTTP Response Type headers)

stuartasutton commented 5 years ago

Phil, I had assumed that schema:URL was intended to do exactly what is being discussed...i.e., used in marking up a page that actually displays a URL literally in the text being marked up.

siuc-nate commented 5 years ago

I had looked at the spec for schema:URL earlier - it defines a whole bunch of properties, so I wasn't sure if it would be "correct" to use it here (that is to say, I wasn't sure if it would cause publishers/consumers to expect a JSON-LD object with a @type of schema:URL and some number of properties inside of it when really the value is just a URL-formatted string).

Fritz also does raise an interesting point about (not) doing this at the schema level. It may be that having a secondary service (a sort of pseudo-context?) that can be retrieved to get such info may be useful in other cases, so perhaps it's worth pursuing what I mentioned in my previous post. Though it may also be nice(r?) to do that in the vanilla context itself, since a consuming system can just ignore properties it doesn't understand/care about there - but I don't know if that's valid, since, when I last checked, I wasn't able to find anything concrete about whether or not you can/should append extra "custom" stuff to the JSON-LD context, or how to do so.

Another alternative might be to simply append another meta property (or maybe schema:additionalType?) to the JSON-LD encoding of the properties in question - that shouldn't raise any spec problems, but it also means you would have to consume the encoding of every term in the schema in order to get that information, rather than just getting it from the (pseudo?-) context.

Regardless, I would still need to know what type to use - again, a subclass of xsd:string indicating a URL-formatted string. I imagine that already exists somewhere.

philbarker commented 5 years ago

No, schema:URL is a class with no properties. It's simply Text that is a URL. The properties listed at https://schema.org/URL are those that may have a schema:URL as their value.

siuc-nate commented 5 years ago

I totally missed that that was what is usually the second table (and the accompanying text above it) - good catch. In that case, that may be the correct data type then. Thanks.

So that leaves the issue of how best to communicate it:

Option 1: Use it in the context (the normal way):

{
  "@context": {
    "ceterms:subjectWebpage": { "@type": "schema:URL" }
  }
}

Pros: It fits the spec, and should work with any system Cons: It maybe violates the spirit of RDF(?) by doing this at the schema level (per @Lomilar's point)

Option 2: Use it in the context (the questionable way):

{
  "@context": {
    "ceterms:subjectWebpage": { "@type": "@id", "schema:additionalType": "schema:URL" }
  }
}

Pros: It lets a system treat the property as either a URL (literal) or something that should be retrieved anyway (arguably this could apply to option 1 as well), and may be a pattern that could be followed in the future for any other properties that can benefit from additional "hints" about how to process them, and should not break any existing systems Cons: I haven't been able to find anything in the JSON-LD spec that says this is legal

Option 3: Use a secondary service (Assuming a secondary URL, say, "http://credreg.net/ctdl/schema/contextra/json" is made available)

{
  "meta:extraContext": {
    "ceterms:subjectWebpage": { "schema:additionalType": "schema:URL" }
  }
}

Pros: We can put whatever we want in this as its usage evolves Cons: Consuming systems will have to know to retrieve it, will have to actually go about retrieving it, and then have to understand its contents once they do retrieve it

Option 4: Put it in the encoding for the term If we appended http://credreg.net/ctdl/terms/subjectWebpage/json with another property:

{
  "@id": "ceterms:subjectWebpage",
  "@type": "rdf:Property"
  ...
  "meta:processingHint": { "schema:additionalType": "schema:URL" }
}

Pros: Same as option 3 Cons: Same as option 3, but worse, as this is even more open to interpretation (and requires loading more data)

In the end, I really like option 2 so far (since it could be potentially expanded upon later - which is maybe a dangerous road to go down), but would fall back to option 1 if that weren't feasible/allowed. Though having said that, option 1 may have a better chance of compatibility since systems wouldn't have to know to look for the additionalType property, and could still choose to resolve schema:URLs if they want to.

philbarker commented 5 years ago

On 05/03/2019 23:09, siuc-nate wrote:

I totally missed that that was what is usually the second table (and the accompanying text above it) - good catch. In that case, that may be the correct data type then. Thanks.

yeah, it's a confusing oddity.

So that leaves the issue of how best to communicate it:

Option 1: use it in the context (the normal way):

|{ "@context": { "ceterms:subjectWebpage": { "@type": "schema:URL" } } } |

Pros: It fits the spec, and should work with any system Cons: It maybe violates the spirit of RDF(?) by doing this at the schema level (per @Lomilar https://github.com/Lomilar's point)

I don't have much to say on the pros and cons of the various options, but when it comes to both the spirit and letter of RDF the folk who wrote schema.org are people I would follow.

--

Phil Barker http://people.pjjk.net/phil. http://people.pjjk.net/phil CETIS LLP https://www.cetis.org.uk: a cooperative consultancy for innovation in education technology. PJJK Limited https://www.pjjk.co.uk: technology to enhance learning; information systems for education.

CETIS is a co-operative limited liability partnership, registered in England number OC399090 PJJK Limited is registered in Scotland as a private limited company, number SC569282.

siuc-nate commented 5 years ago

I appreciate the input, Phil. Thanks.

I'm pretty sure the answer is either option one or two above - @stuartasutton @Lomilar @rsaksida do you have any thoughts on those?

stuartasutton commented 5 years ago

Nate, until you have a fully functional jsonld example that uses all four or your approaches that demonstrates a correct and useful outcome as jsonld AND generates fully functional RDF (turtle, rdf/xml), I can't answer your question...

siuc-nate commented 5 years ago

If you take http://credreg.net/ctdl/schema/context/json and alter the object for ceterms:subjectWebpage to suit one of the first two options, that should be what you need. Will that work?

Or, I could have the system inject a dummy property (for testing) into the context for each of those two approaches if it needs to be a part of the actual context returned via that URL.

Lomilar commented 5 years ago

To be clear, JSON-LD processors won't see it as a link, they see it as a typed data, similar to a xsd:string or a "@type": "https://purl.org/signature#publicKeyPem" (something we do in CaSS to type crypto keys).

That means that a human developer will have to expect and interpret a schema:URL in the same fashion as we have here. We're using a convention that has to be communicated.

{
  "@context": {
    "schema":"http://schema.org/",
    "ceterms:subjectWebpage": { "@type": "schema:URL" }
  },
  "ceterms:subjectWebpage":"http://foo.bar"
}

Expanded (what machines think of it)

[
  {
    "ceterms:subjectWebpage": [
      {
        "@type": "http://schema.org/URL",
        "@value": "http://foo.bar"
      }
    ]
  }
]

Compacted using the same @context

{
  "@context": {
    "schema": "http://schema.org/",
    "ceterms:subjectWebpage": {
      "@type": "schema:URL"
    }
  },
  "ceterms:subjectWebpage": "http://foo.bar"
}

As Quads:

_:b0 <ceterms:subjectWebpage> "http://foo.bar"^^<http://schema.org/URL> .

siuc-nate commented 5 years ago

Thanks, @Lomilar. How does that same set of tests work out if option 2 is used?

Lomilar commented 5 years ago

Context doesn't parse, and it breaks the processor.

Lomilar commented 5 years ago

The reason I am being a purist here is:

Part of the "dream" as it were is that a website can be parsed for microdata or rdfa, as in:

https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fcredentialengine.org

So that even websites or eventually, maybe, one day documents can be used to produce JSON-LD, and that every piece of linked data has a natural viewer, so that the answer to whether a link goes to a website or to a piece of linked data depends on "Well, what is asking, a human or a machine?"

This is why content negotiation is the correct answer, so that every URL to the registry brings up the data displaying website for that data if Accepts:text/html is passed, if Accepts:application/json is passed, then the linked data comes back.

siuc-nate commented 5 years ago

I feel like option 2 would let us have our cake and eat it too in terms of the "dream" you describe.

Context doesn't parse, and it breaks the processor.

Does this happen with multiple JSON-LD processors? I would hope that the norm would be for unknown properties to just get ignored (or, at most, flagged as something unusual), rather than completely breaking something.

stuartasutton commented 5 years ago

Nate, we should not over reach but cleave close to what we know works in known ways. I am going to make two points: the first on the use of schema.org/URL class and the second on whether we should be using anything like this at all at the schema level. This might be a bit lengthy.

Use of schema.org/URL class

It appears that you are wanting to enable having a URL that appears as a clickable link string in data and on the page. As pointed out by Phil, there is the schema.org/URL datatype class intended for just such a purpose. For example, below is a jsonld snippet from the schema.org documentation where we see such a use in a display of someone's contact information. In the last line, the schema.org/url property is used to point to an instance of datatype schema.org/URL which is intended to be displayed as an (active) link. [Of course, having this work in the wild will be dependent on how many parsers/applications pick up on the schema.org intended use of instances of the datatype.]

{
  "@context": "http://schema.org",
  "@type": "Person",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Seattle",
    "addressRegion": "WA",
    "postalCode": "98052",
    "streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
  },
  "colleague": [
    "http://www.xyz.edu/students/alicejones.html",
    "http://www.xyz.edu/students/bobsmith.html"
  ],
  "email": "mailto:jane-doe@xyz.edu",
  "image": "janedoe.jpg",
  "jobTitle": "Professor",
  "name": "Jane Doe",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}

In this message above, Fritz demonstrated the use of the schema.org/URL class in context. As far as I am concerned, this closes the "how to do it" discussion.

{
  "@context": {
    "schema": "http://schema.org/",
    "ceterms:subjectWebpage": {
      "@type": "schema:URL"
    }
  },
  "ceterms:subjectWebpage": "http://foo.bar"
}

CTDL use of this schema.org/URL datatype mechanism

The example property used throughout this issue has been the ceterms:subjectWebpage. Wouldn't declaring the range of ceterms:subjectWebpage as schema.org/URL datatype functionally close the door on its use in content negotiation? In other words, since the datatype would purposely foreclose dereferencing the URI in favor of a static literal expressed as a clickable link, there would obviously be no negotiating transaction? This is absolutely not the intended linked data behavior we want on this (and other like) properties in the wild. So, I'm probably joining the chorus on whether this is a problem to be solved at the application level and not at the schema term declaration level.

siuc-nate commented 5 years ago

Right, which is why I'm frustrated that option 2 seems to be invalid, as it means every application will have to either retrieve (from some central source I could setup) or maintain a list of which properties to retrieve, which ones to treat as links, or as Fritz hinted at, just try to resolve all of them and only care about the ones that return JSON (which seems like an inefficient way to do things, to me).

I'm fine with leaving them as xsd:anyURI in the spirit of linked data, but that still leaves the practical, right-now problem. Perhaps it will need to be left up to the application after all.

I still don't understand why extra properties in the context wouldn't simply be ignored if they aren't understood, though - that seems to be the norm in terms of how browsers handle things everywhere else (and how RDF is supposed to work, right?). But that's outside the scope of this issue.

stuartasutton commented 5 years ago

Nate, is it correct to say that there are the following three states:

The http string is treated as an object property resolving the URI;
The http string is treated as a plain literal; and
The http string is treated as a visible string that is a clickable URL

Is this correct?

The first is achieved with properties declared as "@type": "@id". The second is achieved in json-ld simply by not declaring a type (default=string (or declared as xsd:string)). The third is achieved by using schema:url. Am I correct that in markup the schema:url appears as an actionable link?

siuc-nate commented 5 years ago

We haven't tried using schema:url as a @type (nor have we tried using xsd:anyURI), but it sounds like it should work(?). Again, I'm not sure if it would make for a valid JSON-LD @context or not.

philbarker commented 5 years ago

We haven't tried using schema:url as a @type

should be schema:URL

siuc-nate commented 4 years ago

Resurfacing this again, as it has come up in the context of query translation for SPARQL. Specifically, I want to enable doing partial string matches on URL fields (like subjectWebpage) without having to first convert them to strings (for performance reasons). In other words:

?s ceterms:subjectWebpage ?o filter(regex(str(?o), '.*blah.*'))

is slower than

?s ceterms:subjectWebpage ?o filter(regex(?o, '.*blah.*'))

and right now the @context-driven query builder doesn't see any difference between a ceterms:offeredBy and a ceterms:subjectWebpage, and treats both of them as URIs.

There are various workarounds for this, but it is another case of "it would be really useful to be able to tell the difference via the @context" so that it would just work.

My worry now is that changing the @context risks breaking something.

stuartasutton commented 4 years ago

Nate, you either have to do some kind of workaround or convince the appropriate W3C work groups that RDF needs to be extended to include the notion underlying schema:URL--i.e., this is a special kind of string entity that does not resolve but rather plants the text on a page as an actionable URL. If it is truly something that's relentlessly under your skin, then, as Lady Macbeth says, "screw your courage to the sticking place", join the RDF group and voice your concern.

CredentialEngine / Schema-Development

@context problem #560

Use of schema.org/URL class

CTDL use of this schema.org/URL datatype mechanism