dcmi / dcap

DC Tabular Application Profile - supporting materials
28 stars 12 forks source link

Define SHAPE #73

Closed kcoyle closed 2 years ago

kcoyle commented 3 years ago

We need a good (and terse) definition for SHAPE. Right now the primer document has:

"In the development of validation concepts for RDF data the emphasis is on the set of statements that specify how an entity is described. These statements combine into groups called shapes, where a shape defines the structure that applications can expect to find in a view over a piece of data."

philbarker commented 3 years ago

Shape: a group of statements in an application profile that define how an entity is described in metadata.

(it partly depends on how 'statement' is defined, but I would like to distinguish between statements in an application profile, i.e. rows in the csv, and metadata statements, e.g. RDF triples)

kcoyle commented 3 years ago

Here's where I'm working on term definitions (I'll send out email shortly), but realize now that I haven't yet given a definition to statement, so we can work on SHAPE and STATEMENT concurrently.

I like the "group of statements". We have run into situations where the shape isn't much of an entity, such as:

property is: dc:subject (a literal) or dct:subject (a URI)

has subject -> Mandatory -> @subject @subject dc:subject - optional -> literal dct:subject - optional -> URI

... so maybe say something like "entity or functional set of terms ....". But I'm also ok with defining it as an entity and leave the pseudo-entity cases unmentioned, although folks will discover them as they work.

philbarker commented 3 years ago

I like the "group of statements". We have run into situations where the shape isn't much of an entity, such as:

A shape isn't ever an entity, it defines how some entities are described. I think that covers a functional set of properties if you happy with the object of dc:subject being "an entity"

Quick refinement of what I suggested: Shape: a group of statements in an application profile that define how certain entities are described in metadata.

We could just say that a "shape is a group of statements in an application profile", though not all groups of statements are shapes, and this may just push the probelm from one definition to another as I bet we'll run in to the "entity" problem when we define what statements relate to.

tombaker commented 3 years ago

@philbarker @kcoyle I agree with Phil that we should be careful not to confuse shapes (which describe the structure and content of metadata) with entities (things that are described by that metadata). I agree that the wording "that define how certain entities are described in metadata" avoids that confusion but "statements", on its own, seems rather abstract and risks confusion with "metadata statements" (RDF triples).

Ruben Verborgh has one of the best generic definitions I have seen: "A shape defines the fields and structure that client and apps can expect to find in a view over a piece of data." I'm not entirely happy with "fields", but I like the way this definition emphasizes "structure" that one "can expect to find in a view over a piece of data".

How about: "A shape is a set of statements that enumerate the structure and contents that users or applications can expect to find in a view over all or part of a metadata description."

ericprud commented 3 years ago

Unless you want to sacrifice clarity for accuracy, maybe you can trim the view out:

A shape enumerate the structure and contents that users or applications can expect to find in a metadata description."

I guess "metadata description" is because that's your audience? Shapes don't favor describing metadata any more than any other data.

kcoyle commented 3 years ago

A shape isn't ever an entity, it defines how some entities are described.

I was referring to the thing in the real world that people will think of as entities, but obviously didn't word that well. It's the link between what is in peoples' heads when they think about what they are describing, and what shape the metadata takes. There will be shapes in the metadata that don't conform to real world "things" in the minds of (especially naive) profile developers. I think it's ok not to go into that because the more experienced developers will do what they need to do and won't be referring back to definitions.

How about: "A shape is a set of statements that enumerate the structure and contents that users or applications can expect to find in a view over all or part of a metadata description."

Tom, I think what you say here is accurate, but I wonder if we can't bring it down a bit from abstract toward something more down to earth. I do like "group of statements" rather than "set" since "set" implies something mathematical and the actual groups may not be rigorous. Ditto "enumerate" I'm afraid. I'm thinking something much simpler such as

Creation view: "A shape is a group of statements that together represent something described in the metadata." Ingest view: "A shape is a group of statements that together represent a description of something to be found in the metadata."

I don't like "something" (it could be "some thing") but I don't want to talk about entities. However, the shape does represent some logical grouping that is "about" something, often some real world thing that is being described in metadata. The other thing is the "creation" view vs the "ingest" view. "... expect to find" is the ingest view. We need to hit the creation view as well as the ingest view.

tombaker commented 3 years ago

I guess "metadata description" is because that's your audience? Shapes don't favor describing metadata any more than any other data.

Point taken, and I have often made this point myself (and thus hesitated). The context here is "application profiles", which is typically synonymous with "metadata application profiles". I'm on the fence about this, because the CSV interface might well be useful to people who do not think of their data as metadata.

tombaker commented 3 years ago

I do like "group of statements" rather than "set" since "set" implies something mathematical and the actual groups may not be rigorous. Ditto "enumerate" I'm afraid.

I'm okay with "group of statements" and avoiding "enumerate".

Creation view: "A shape is a group of statements that together represent something described in the metadata."

The problem is that if one substitutes "shape" with "sub-graph" and "metadata" with "RDF graph", this statement could almost read as a description of RDF due to the ambiguity of "statements" and "something described in the metadata".

Ingest view: "A shape is a group of statements that together represent a description of something to be found in the metadata."

What I like about the RubenV-style definition is the reference to things that one "expects to find in the data".

Another attempt: "A shape is a group of statements that specify how a given entity is described in a given set of data or metadata in terms of the properties, values, and relations among entities that users or applications can expect to find in that data."

philbarker commented 3 years ago

I agree with Tom about the "statement" being problematic due to potential confusion with RDF statement in the data. It might even be worth considering calling the rows in the CSV something else (but what). I used "statements in an application profile" to avoid this. I suggest something like

Application Profile Statement: a row in the csv that ...[whatever] Shape: a group of Application Profile Statements that ...[whatever]

On Karen's suggestions, I prefer "represent how something may be described in metadata" (not "represent something described in the metadata" because the metadata represents something described, the AP represents the description--which I'm sure is what Karen meant, but if you didn't know what is meant then it is open to mis-interpretation). I think "may be" covers creation and ingest.

So

Application Profile Statement: a row in the csv that ...[whatever] Shape: a group of Application Profile Statements that represent how something may be described in metadata

tombaker commented 3 years ago

I agree with Tom about the "statement" being problematic due to potential confusion with RDF statement in the data. It might even be worth considering calling the rows in the CSV something else (but what). I used "statements in an application profile" to avoid this.

FWIW I ended up renaming the Statement class in my script to CSVRow because I was getting confused myself...

So

Application Profile Statement: a row in the csv that ...[whatever] Shape: a group of Application Profile Statements that represent how something may be described in metadata

Could we perhaps avoid "statement" altogether by calling them Property-Value Pairs? This is in fact what they were called in the DCMI Abstract Model (2007), the summary of which starts with the sentence "Each described resource is described using one or more property-value pairs."

Phil's short-form definition (modified) would read: "a group of property-value pairs that represent how something may be described in metadata".

This is fairly close to the start of my (modified) definition: "a group of property-value pairs that specify how a given entity is described in a given set of data or metadata...".

The longer definition went on to say: "...in terms of the properties, values, and relations among entities that users or applications can expect to find in that data." I think it is good to spell out that these are things "that users or applications can expect to find", but if we mention properties and values in the main definition, these other things could go into the explanatory text.

While we have said that the minimal application profile specifies the properties, and nothing more, I would argue that the properties are all paired with implicit values, even if those values are completely unconstrained.

kcoyle commented 3 years ago

Phil's short-form definition (modified) would read: "a group of property-value pairs that represent how something may be described in metadata".

I think of property-value pairs as being descriptive of the instance data that our profile is describing, and each is just a property and a value. To call our template row that rather short changes the richness of what the row provides, such as cardinality of the property and the value type and constraints. That's why statement works better for me because it sounds like it includes more. "Statement" comes from the DSP - not that we have to stick with the DSP terms, but it does provide a foundation. DSP defines a statement simply as what it contains:

Statement templates , which contain all the constraints on the property, value strings, vocabulary encoding schemes, etc. that apply to a single kind of statement.

I suppose we could go that route with statements and give a definition that is mainly a description, then define shape as a group of statements that describe a single "thing". (But with better wording)

philbarker commented 3 years ago

I think of property-value pairs as being descriptive of the instance data that our profile is describing, and each is just a property and a value. To call our template row that rather short changes the richness of what the row provides,

+1

"Statement" comes from the DSP

I think the DSP uses "statement template" to describe the form that statements should take, per @kcoyle's quoted text; statements being made about entities in the instance data. I like the language of "templates" for defining the forms and shapes of data, but remember that Karen doesn't.

How about we refer to a row as the statement rules to which properties in instance data must conform? or how about property profile the part of an application profile defining how a property should be used?

tombaker commented 3 years ago

@kcoyle I take the point about "property-value pairs" as being of the instance data, though I think we have that problem with "statement" as well.

To allude to the richness of "statements", would it help to say something like:

Shape: a group of property and value constraints on how something may be described in metadata.

tombaker commented 3 years ago

Constraint Statement: a group of property and value constraints on how something may be described in metadata.

Shape: A group of constraint statements.

tombaker commented 3 years ago

Statement Constraints: A group of property and value constraints on Metadata Statements.

tombaker commented 3 years ago

Metadata Statement: a property-value pair in instance data that...

tombaker commented 3 years ago

@kcoyle Do we want to continue this discussion in this issue thread?

kcoyle commented 2 years ago

See style guide.