CredentialEngine / Schema-Development

Development of the vocabularies for the CTI models
14 stars 8 forks source link

Reconsider usage of xsd:date #945

Open siuc-nate opened 2 months ago

siuc-nate commented 2 months ago

We have current use cases that require support for partial dates, e.g. 2024-08 and/or 2024, which I thought were supported by xsd:date, but actually are not (ISO8601, which xsd:date is based on, supports them, but not xsd:date itself). https://www.w3.org/TR/xmlschema-2/#date https://www.w3.org/TR/xmlschema-2/#isoformats https://www.w3.org/TR/xmlschema-2/#truncatedformats

Right truncated formats are also, in general, not permitted for the datatypes defined in this specification with the following exceptions: right-truncated representations of dateTime are used as lexical representations for date, gMonth, gYear.

I can see two potential solutions:

The second option is probably the safer of the two, as it avoids opening the door to other variations/truncations ISO8601 allows. That would also give us the flexibility to allow/deny, for example, YYYY-MM-DD and YYYY-MM formats but not YYYY formats for some properties (if we ever wanted to do that).

We only need to update the properties where we specifically want to allow partial dates, but if we were to update all properties with a range of xsd:date, the list would be:

Note: I can't think of a case where we need to worry about partial xsd:dateTimes, so we're probably okay with those properties as-is, but they're worth considering, too.

philbarker commented 2 months ago

For what it is worth I tend to use the latter approach, gYear, gYearMonth and the like.

siuc-nate commented 2 months ago

That's what I'm currently leaning towards as well.

philbarker commented 1 month ago

@siuc-nate one fly in the ointment with allowing more than one datatype for the date properties is that currently the context file has things like

    "ceterms:dateEffective": {
      "@type": "xsd:date"
    },

That would have to be removed and cannot be replaced if the datatype for values could be one of several options. The only way to specify the data type would be in the instance data, e.g.

     "@id": "http:example.org/something",
     "ceterms:dateEffective": {
        "@type": "http://www.w3.org/2001/XMLSchema#gYear",
        "@value": "2011"
      } 

Alternatively, use "duck typing" on ingest (from the options in the property's range) as we discussed on the last call.

siuc-nate commented 1 month ago

I wasn't aware of that limitation of JSON-LD until just now. That's really unfortunate and doesn't make much sense to me, but appears to be true. I went to see how schema.org handles multi-typing in the schema.org context file and found the following, some of which are odd to me:

This makes me question the usefulness of providing anything in the context file after the first block that provides what the URL prefixes map to. A consumer of the data is going to need to get the actual schema one way or another in order to find out what the ranges of properties are (especially for properties that are explicit URLs vs general URIs, as discussed elsewhere (#819)).

Also, the JSON-LD playground complains if you try this: image

Perhaps we should remove all of the @type references from our context as well, so as not to mislead consumers of the data?

philbarker commented 1 month ago

I wasn't aware of that limitation of JSON-LD until just now. That's really unfortunate and doesn't make much sense to me,

I'm afraid it's necessary. The context block maps values in the JSON objects to RDF, and it can only do a one-to-one mapping that works for all values, so yeah if you give it an array and say the value may be any one of these three datatypes you're not giving it information that will work.

Perhaps we should remove all of the @type references from our context as well, so as not to mislead consumers of the data?

Then everything would be xsd:strings, which probably isn't an improvement.

Setting the range to xsd date, gMonth, gYear and the @type in the context to schema:Date would not be wrong as those are all formats that are compatible with being a date value in ISO 8601 date format.

The alternative is Rohit's suggestion of a different property for each datatype.

siuc-nate commented 1 month ago

Then everything would be xsd:strings, which probably isn't an improvement.

Wouldn't that also be a problem for schema.org's context? They don't specify a type value (in the context) for the vast majority of their properties, not even ones that would make sense as @type: @id. I think we create more confusion than we solve by specifying some types, but not others, or specifying them with a lesser degree of precision than that provided by the schema:rangeIncludes values for those properties. If people can figure out how to use schema.org's context with that as a limitation, it should work for us too, I'd think.

philbarker commented 1 month ago

Schema.org has a convention that you can use a string value for most things instead of an @id and puts the onus on data consumers to figure out what is meant:

"We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string, even if our schemas don't formally document that expectation. "

https://schema.org/docs/datamodel.html

That makes being a data consumer of schema.org harder. Not an example I think we should follow.

I don't think data providers do try to figure out how to use terms based on the @context, they mostly work from examples or the terms definition. The @context is just one way of getting from JSON to RDF, and we should make sure the context file we provide does as good a job of that as possible.

siuc-nate commented 6 days ago

Current proposals, per today's meeting:

Option one: Change the range of the properties listed above to use schema:Date instead, but limit our accepted variations to those described by xsd:date, xsd:gMonthYear, and xsd:gYear

Option two: Change the range of the properties listed above to use xsd:date, xsd:gMonthYear, and xsd:gYear

siuc-nate commented 6 days ago

@philbarker Can you check these with your JSON-LD validation? Ideally, the second approach would work, since I think that satisfies the issues that were brought up in today's meeting:

{
    "@type": "ceterms:Certificate",
    "ceterms:dateEffective": {
        "@type": "schema:Date",
        "@value": "2024-10"
    }
}
{
    "@type": "ceterms:Certificate",
    "ceterms:dateEffective": {
        "@type": "xsd:gYearMonth",
        "@value": "2024-10"
    }
}
siuc-nate commented 4 days ago

I was talking with @rohit-joy today and he raised some good ideas for how to tackle this:

  1. First, we determine which of the properties from the list above we do and don't want to allow partial dates for
  2. Second, of the ones for which we want to allow partial dates, determine which ones are ordinal (need to support sorting, calculations, etc.) and which ones are nominal (basically just used for display)
  3. Third, of the ones that are ordinal, determine how best to move forward (e.g. have/create a separate property that is explicitly an xsd:date that doesn't support partial dates and an equivalent property that does, as text/string)

The intent is to make the data easier to consume and use for a variety of potential/likely use cases.

philbarker commented 4 days ago

@siuc-nate OK, that makes sense.

philbarker commented 4 days ago

btw,

      {
        "@type": "xsd:gYear",
        "@value": "2024"
      }

is indeed valid & correct.