dcmi / dcap

DC Tabular Application Profile - supporting materials
28 stars 12 forks source link

Minimal profile defined by Schema.org classes? #62

Closed tombaker closed 2 years ago

tombaker commented 4 years ago

On today's call, we recalled that the most minimal profile could consist just of a list of properties, and Phil asked if a minimal profile could also consist of a list of Schema.org classes.

tombaker commented 4 years ago

@philbarker As I see it, Schema.org fits the RDF data model but presents classes along with sets of suggested properties that may be used to describe instances of those classes. My guess at what you mean is that it should be possible to say, simply, that some given metadata describes instances of, say, sdo:Book and sdo:Person. If so, might it be possible to say this, within the limitations of the current CSV model under discussion, in the following way?

Shape_ID Property_ID VType VConstraints VShape_ID
@book
rdf:type URI sdo:Book
sdo:creator URI @person
@person
rdf:type URI sdo:Person

This is not a pretty way to say such a seemingly simple thing, but I see no other obvious way to express this without somehow extending the CSV model, especially in light of our decision to design the CSV model for now for profiles based on RDF.

tombaker commented 4 years ago

I also see no obvious way to express the idea that all properties must be properties in the Schema.org namespace.

philbarker commented 4 years ago

@tombaker

My guess at what you mean is that it should be possible to say, simply, that some given metadata describes instances of, say, sdo:Book and sdo:Person.

Yes, that's about right. I think it is simplest to phrase as a metadata creation use case: when creating metadata for a book use sdo:Book to say what type of thing it is, without limit on the properties. This relates to what I listed as the first requirement for use case #13: Describing Course and Curriculum Materials, though that is a much more complex use case. There is a harder problem when receiving data of knowing which bit of the profile to check entity descriptions against, and knowing to check anything typed sdo:Book against @Book might be part of the solution to this.

What you show is pretty much where we are. I wouldn't mind changing Property_ID in the table to Term_ID and using that to say that what we call @Book in the application profile relates to what is called schema:Book in the base spec/vocabulary being profiled; but I think we've been through that argument a lot. If you recall, an early example[1] I tried had a separate sheets for the vocabularies, classes and profiles being used, that might be a possible future expasion.

As for expressing the idea that all properties must be properties in the schema.org namespace and, further, which properties can be used for which entity type without listing them, I can think of two ways of doing that:

  1. We could have "closed" application profiles, where only the namespaces explicitly declared may be used--one more thing to think about wrt to metadata describing the AP.
  2. The schema.org schema definition says which properties are expected for entities of which class (to the extent that schema:domainIncludes limits this), so the AP doesn't have to.

I don't think there is anything here that has to be done for the first iteration of simple profiles; however there are things that if done now will inhibit future choices. For example using "property id" as a column heading means that any statement about selecting class terms has to use a different id column.

[1] https://docs.google.com/spreadsheets/d/1sj_3bLBy1vtMimlJuiw54BsgmUMcTb9Xq8E9zyUz7zo/edit#gid=71724746

kcoyle commented 4 years ago

I see being able to designate one or more namespaces as valid, without specifying properties, as a useful filter for folks receiving metadata from outside sources. Since anyone can say anything etc. this would be a way to filter out properties that your system may not know how to handle. We do have use cases that are based on validating incoming metadata that one does not control. Presumably more than one namespace could be included in the profile.

I see it as less useful for data creation, since a designation like "must be schema.org" doesn't seem to require a profile, although I wouldn't say that we should prevent someone from creating such a profile if it's what they need.

philbarker commented 4 years ago

I see it as less useful for data creation, since a designation like "must be schema.org" doesn't seem to require a profile

When the schema is as big as schema.org, and you want to automatically create forms for data input (say to describe books), it's useful to have a list that reduces the number of classes you need to handle.

kcoyle commented 4 years ago

When the schema is as big as schema.org, and you want to automatically create forms for data input (say to describe books), it's useful to have a list that reduces the number of classes you need to handle.

I was responding to the statement by Tom that there isn't a way to limit all properties to a single namespace. Sorry, I should have been clearer about it. I do think this is something that is useful for ingest of foreign metadata. I'm going to hold that for discussion of the next extensions to the simple template.

When you say as above, are you meaning the class declarations in the instance data? Or domains defined in the vocabulary? If the latter, we haven't talked about how the properties in our profile interact (or not) with the vocabularies they are taken from. There was a question about whether one must use the label from the vocabulary (I hope not!). So we will take this up at some point.

philbarker commented 4 years ago

Sorry @kcoyle, I misunderstood.

I mean a constraint on the classes that can be used in instance data (but still conforming to schema definition of base vocabulary). I also hope we can over-ride labels.

kcoyle commented 2 years ago

Phil agrees "out of scope" - but may relate to separate rows for shapes.