dret / I-D

Internet Drafts I've authored or contributed to.
16 stars 13 forks source link

RFC 6906: Also additional structural constraints? #94

Open RubenVerborgh opened 6 years ago

RubenVerborgh commented 6 years ago

RDF 6906 states:

For the purpose of this specification, a profile can be described as additional semantics that can be used to process a resource representation, such as constraints, conventions, extensions, or any other aspects that do not alter the basic media type semantics.

May I suggest "additional semantics and/or structural constraints"?

dret commented 6 years ago

On 2018-01-03 05:12, Ruben Verborgh wrote:

RDF 6906 states:

For the purpose of this specification, a profile can be
described as additional semantics that can be used to process a
resource representation, such as constraints, conventions,
extensions, or any other aspects that do not alter the basic media
type semantics.

May I suggest "additional semantics and/or structural constraints"?

i think i agree, but i am wondering why you're just mentioning structural constraints? to me a profile might also just constrain values in some way, leaving the underlying media type's structural properties as they are.

also, the semantics are just a (non-formalized) side-effect of the constraints, right? what about this:

"additional semantics (represented through structural and/or value-based constraints)"

RubenVerborgh commented 6 years ago

i am wondering why you're just mentioning structural constraints?

Because structural constraints are not necessarily “additional semantics”. I.e., we could imagine creating a profile that imposes certain structural constraints on a JSON file, but no semantics for them. Such a profile would not fall under the definition of “a profile can be described as additional semantics”.

dret commented 6 years ago

On 2018-01-06 01:14, Ruben Verborgh wrote:

i am wondering why you're just mentioning structural constraints?

Because structural constraints are not necessarily “additional semantics”. I.e., we could imagine creating a profile that imposes certain structural constraints on a JSON file, but no semantics for them. Such a profile would not fall under the definition of “a profile can be described as additional semantics”.

that's a good point. but aren't the semantics just the (intentional) "side-effect" of the constraints? i am mostly thinking that what defines a profile are the constraints (that i have to adhere to when using that profile). if these "mean" something and what is important for users of the profile, but the profile really is just the messenger.

so i am wondering whether it actually would be more accurate to define profiles by their virtue of adding constraints, and then explaining that those constraints most often are put into place to back semantics that are based on those constraints being adhered to.

RubenVerborgh commented 6 years ago

but aren't the semantics just the (intentional) "side-effect" of the constraints?

Might be, but not necessarily. Just knowing that data has a certain shape is already useful.

so i am wondering whether it actually would be more accurate to define profiles by their virtue of adding constraints

Agree on a profile defining constraints. Here are (non-final) definitions we came up with in the context of the DXWG working group:

So we basically identified three groups of constraints (syntax / structure / semantics), of which media types can influence all three, but profiles only the latter two.

dret commented 6 years ago

On 2018-01-08 02:18, Ruben Verborgh wrote:

but aren't the semantics just the (intentional) "side-effect" of the
constraints?

Might be, but not necessarily. Just knowing that data has a certain shape is already useful.

true. which is why i am suggesting to focus on the constraints, and then just treat the semantics as the usual explanation of why people are doing that.

so i am wondering whether it actually would be more accurate to define
profiles by their virtue of adding constraints

Agree on a profile defining constraints. Here are (non-final) definitions we came up with in the context of the DXWG working group:

  • A media type is a set of syntactic constraints, structural constraints, and/or semantic interpretations that can be used to serialize information content.

that's kind of odd. by definition, syntax is a about structure, so at least for the textbook definitions of these two words it seems odd to separate those two. what definitions were used to come up with these two as distinct categories?

https://en.wikipedia.org/wiki/Syntax

  • A profile is a set of structural constraints and/or semantic interpretations that can apply to information content in addition to constraints and interpretations mandated by a media type.

sounds reasonable (minus the general syntax/structure oddity). i am struggling to understand how a profile could be defined/useful when it's just semantics, but without making that tangible via any constraints. then it seems a profile isn't actionable in any shape or form and i fail to see its utility as a "formal" profile.

RubenVerborgh commented 6 years ago

that's kind of odd. by definition, syntax is a about structure, so at least for the textbook definitions of these two words it seems odd to separate those two. what definitions were used to come up with these two as distinct categories?

Syntax: JSON, XML, … Structure: document with these specific elements

For instance, HAL has a JSON syntax with the extra structural constraints of having _links and _embedded elements.

dret commented 6 years ago

On 2018-01-08 23:01, Ruben Verborgh wrote:

that's kind of odd. by definition, syntax is a about structure, so at
least for the textbook definitions of these two words it seems odd to
separate those two. what definitions were used to come up with these two
as distinct categories?

Syntax: JSON, XML, … Structure: document with these specific elements For instance, HAL has a JSON syntax with the extra structural constraints of having |_links| and |_embedded| elements.

that just seems a rather non-standard use of long established terminology. syntax defines structure (that's pretty much all it does), it defines how to use the symbols used in those languages to represent the structured data that they represent.

it might be a bit easier for others to tune into all of this if it used words more in the way they are commonly used. HAL uses JSON syntax and structure and imposes additional constraints on top of that, but that's very different from saying that syntax and structure are different things.

(i just looked into the DXWG profile-related pages and was amazed by the fact that they don't even mention RFC 6906. i tend to agree with @handrews on all of this: at least acknowledge what's out there instead of reinventing a new terminology world. or if you do, maybe it's easier for all involved to pick new terms.)

RubenVerborgh commented 6 years ago

syntax defines structure (that's pretty much all it does),

Low-level structure, yes. The fact that keys in a JSON document are strings.

I'd say that the fact that JSON keys are delimited by " is a syntactical constraint, and the fact that HAL requires a _links element is a structural constraint.

The difference is important because regular JSON documents, HAL, JSON-LD, etc. can all be parsed by a JSON parser. So they share the same syntactic constraints. On top of that, HAL and JSON-LD have additional structural constraints.

it might be a bit easier for others to tune into all of this if it used words more in the way they are commonly used. HAL uses JSON syntax and structure and imposes additional constraints on top of that, but that's very different from saying that syntax and structure are different things.

I'm open to a more accurate naming. But to me this is the crucial thing in profiles: the media type defines what parser to use, and the profile defines what structural and semantic assumptions you're allowed to make. So I want to distinguish somehow in a meaningful way.

(i just looked into the DXWG profile-related pages and was amazed by the fact that they don't even mention RFC 6906.

They're just draft of what individuals in the group think how profiles should be defined. Since RFC 6906 does not include a definition of a profile, it's probably not linked from that page.

i tend to agree with @handrews on all of this: at least acknowledge what's out there instead

Obviously RFC 6906 will have a place in the end result.

dret commented 6 years ago

On 2018-01-08 23:20, Ruben Verborgh wrote:

syntax defines structure (that's pretty much all it does),

Low-level structure, yes. The fact that keys in a JSON document are strings.

well, whatever it takes to define the media type. exactly that, and not more. that's what syntax is all about.

I'd say that the fact that JSON keys are delimited by |"| is a syntactical constraint, and the fact that HAL requires a |_links| element is a structural constraint.

apparently that's what you say. all i want to point out is that this is a rather specific way of using established terminology, and it may not help if you want other to understand things. syntax is structure. that's what syntax is all about.

The difference is important because regular JSON documents, HAL, JSON-LD, etc. can all be parsed by a JSON parser. So they share the same syntactic constraints. On top of that, HAL and JSON-LD have additional structural constraints.

they have additional constraints, yes. the term "structural" here implies a precision and a distinction that doesn't exist. you could easily have cases with no "structural" constraints and just value ones. those would be equally valid examples.

anyway. this discussion (and thanks for it!) makes me confident that profiles should very strictly talk about being constraints, without qualifying that in any further way.

RubenVerborgh commented 6 years ago

I guess the only thing I wanted to do was to say that “profiles should not require clients to use a different parser”. So (only) the media type determine the parser.

Coming back to the same example: HAL and JSON-LD both use a JSON parser, but then make some additional assumptions about the shape of the resulting in-memory representation. (So they can be considered profiles on top of JSON instead of media types; and had the technology been available, at least HAL shouldn't have required its own media type). JSON-LD will not throw a "syntax error" on a valid JSON document—even if that document is invalid JSON-LD.

I don't mind using other terminology. I understand we're in disagreement regarding syntax/structure (and I'm willing to change).

the term "structural" here implies a precision and a distinction that doesn't exist.

So are we also in disagreement about the parser being a meaningful distinction? So that there is some difference between the kind of constraints that differentiates application/json from text/plain on the one hand, and the kind that differentiates HAL from JSON? And that these first kind of constraints belong to a media type and not a profile? Or are all constraints the same to you?

If there's a difference, and I think there is, I'm looking for a term to indicate that. If there is no difference, what distinguishes a profile from a media type?

dret commented 6 years ago

On 2018-01-09 00:01, Ruben Verborgh wrote:

If there's a difference, and I think there is, I'm looking for a term to indicate that. If there is no difference, what distinguishes a profile from a media type?

that's an easy one. media types are created out of thin air and are self-contained. profiles are always based on a media type.

RubenVerborgh commented 6 years ago

Is HAL a profile or a media type? JSON-LD? RDF/XML?

RubenVerborgh commented 6 years ago

Also, can profiles apply to only a single media type then?

dret commented 6 years ago

On 2018-01-09 00:07, Ruben Verborgh wrote:

Is HAL a profile or a media type? JSON-LD? RDF/XML?

clearly these are media types, as they define themselves as such.

RubenVerborgh commented 6 years ago

clearly these are media types, as they define themselves as such.

conflicts with

media types are created out of thin air and are self-contained

dret commented 6 years ago

On 2018-01-09 00:08, Ruben Verborgh wrote:

Also, can profiles apply to only a single media type then?

https://tools.ietf.org/html/rfc6906#section-3

"While this specification associates profiles with resource representations, creators and users of profiles MAY define and manage them in a way that allows them to be used across media types; thus, they could be associated with a resource, independent of their representations (i.e., using the same profile URI for different media types). However, such a design is outside of the scope of this specification, and clients SHOULD treat profiles as being associated with a resource representation."

dret commented 6 years ago

On 2018-01-09 00:19, Ruben Verborgh wrote:

clearly these are media types, as they define themselves as such.

conflicts with

media types are created out of thin air and are
self-contained

not really. whether media types include or transclude things they may be built on is a pure technicality. you could easily define all of these in a way that has none of the dependencies that you probably refer to.

RubenVerborgh commented 6 years ago

Okay, then it's clear that we have different starting points. I don't see such transclusion as a technicality: for me, a media type is associated with a parser. So if something does not require a different parser, then it shouldn't be a media type. That's the reason why I think profiles are useful: you can add additional assumptions that can be made after the parsing stage.

The benefit of seeing HAL and JSON-LD as profiles of JSON, is that they can be combined (an argument I've discussed here). That is, one can perfectly imagine a JSON document that both adheres to the HAL constraints and the JSON-LD constraints—but using MIME types for these two instead of profiles, prevents a client from using that.

I hope this also shows why it's important for me to distinguish between constraints that affect parsing (which I referred to as “syntax”) and others (which I—perhaps inaccurately—referred to as “structure”).

However, there's something more fundamental:

whether media types include or transclude things they may be built on is a pure technicality.

With that definition, any profile that is tied to one specific media type, could equally be considered a media type itself, given that it then transcludes the first media type.

While this specification associates profiles with resource representations, creators and users of profiles MAY define and manage them in a way that allows them to be used across media types; thus, they could be associated with a resource, independent of their representations (i.e., using the same profile URI for different media types).

I have an issue with that phrasing: it's not because a profile is used across media types that this profile is necessarily associated with the resource. Given media types X, Y and profiles A, B, I might be able to represent a resource as X+A, X+B, Y+A, Y+B.

dret commented 6 years ago

On 2018-01-09 00:41, Ruben Verborgh wrote:

Okay, then it's clear that we have different starting points. I don't see such transclusion as a technicality: for me, a media type is associated with a parser. So if something does not require a different parser, then it shouldn't be a media type. That's the reason why I think profiles are useful: you can add additional assumptions that can be made after the parsing stage.

what's a "parser" for you? you could argue that atom shouldn't be a media type because an XML parser is all you need? or you could argue because you'd actually want a feed to be parsed into feed-level structures, meaning that you need a parser? again, i think you're implying a precision/distinction here that doesn't exist (at the level of clarity you seem to be after).

i've certainly seen people doing both: processing feeds as XML and then writing their own XPaths assuming that the XML is a feed. or processing feeds with an integrated package that consumes raw XML and spits out some "feed DOM" that already addresses some of the peculiarities of feeds, such as how to handle/derive "author" info.

for podcasts for example you would have three levels of parsing/models: first parse the XML to get the feed. that gives you an XML model. then you can interpret the feed structures to get to a feed model. and then you can interpret the podcast structures to get to a podcast model. how all of this is implemented is opaque. it's what pretty much always goes on: different levels of abstraction layered on top of each other. profiles just say that there's a additional level, that's all there is to it.

The benefit of seeing HAL and JSON-LD as profiles of JSON, is that they can be combined (an argument I've discussed here https://ruben.verborgh.org/articles/fine-grained-content-negotiation/#possible-but-inadequate-workarounds-p-1). That is, one can perfectly imagine a JSON document that both adheres to the HAL constraints and the JSON-LD constraints—but using MIME types for these two instead of profiles, prevents a client from using that.

but they are media types, so there's little you can do.

I hope this also shows why it's important for me to distinguish between constraints that affect parsing (which I referred to as “syntax”) and others (which I—perhaps inaccurately—referred to as “structure”).

i still don't get that. i know that you want things to be clear-cut, but i cannot see a way how to see things that way without redefining what's out there already, and implying dualities that aren't quite as clear.

for example, you could have an "XML profile" that said attributes always must use quotes (and not apostrophes). that implies a specific parser (feature) and has no structural implications (given your definition of structure). wouldn't that be an acceptable profile?

I have an issue with that phrasing: it's not because a profile is used across media types that this profile is necessarily associated with the resource. Given media types X, Y and profiles A, B, I might be able to represent a resource as X+A, X+B, Y+A, Y+B.

nothing in RFC 6906 keeps you from doing that.

RubenVerborgh commented 6 years ago

what's a "parser" for you?

Something that processes a representation's stream of bytes into a higher-level model.

you could argue that atom shouldn't be a media type because an XML parser is all you need?

Indeed. An Atom document is an XML document conforming to the (to be defined) Atom profile. Unfortunately, it's defined differently because profiles didn't exist at the time. Yet all Atom libraries first parse the regular XML document, and then only start applying the specific Atom structural and semantic constraints.

or you could argue because you'd actually want a feed to be parsed into feed-level structures, meaning that you need a parser?

But that wouldn't be a parser of the representation sent by the server. It would be a convertor from an XML in-memory model to a list of feeds.

again, i think you're implying a precision/distinction here that doesn't exist (at the level of clarity you seem to be after).

Can you point me to one Atom implementation that doesn't parse the document as XML first? A HAL parser that doesn't parse JSON first? A JSON-LD parser that doesn't parse JSON first? If not, then I think the distinction is pretty clear.

different levels of abstraction layered on top of each other.

Yes, and to me that lowest level is the document type, such as XML or JSON. They have common parsers (as in "convertors from bytes to in-memory objects"). All the higher levels are profiles; they do not operate on the bytes in the representation.

i cannot see a way how to see things that way without redefining what's out there already

I don't intend to fix the past, but rather to make it easier and more flexible to define new things in the future. So if a new HAL 2.0 comes up, that it can be defined as a profile on top of JSON, rather than introducing an entire new media type from scratch. It entails the benefit of transparently reusing parsers (as the content type remains JSON), and being able to combine HAL 2.0 with other things.

for example, you could have an "XML profile" that said attributes always must use quotes (and not apostrophes). that implies a specific parser (feature) and has no structural implications (given your definition of structure). wouldn't that be an acceptable profile?

It would not be a profile to me, but a media type. Hence my definition of media type as a set of [byte-level] syntactic, [model-level] structural, and semantic constraints, and a profile as only [model-level] structural and semantic constraints but not [byte-level] syntax.

Given media types X, Y and profiles A, B, I might be able to represent a resource as X+A, X+B, Y+A, Y+B.

nothing in RFC 6906 keeps you from doing that.

Indeed, but my comment is that the phrasing seems to imply that, when profiles are used across media types, they are associated with the resource instead of the representation. I suggest to change the phrasing, as this is not necessarily the case.

Plus, this point is still open:

whether media types include or transclude things they may be built on is a pure technicality.

With that definition, any profile that is tied to one specific media type, could equally be considered a media type itself, given that it then transcludes the first media type.

dret commented 6 years ago

On 2018-01-09 11:13, Ruben Verborgh wrote:

what's a "parser" for you?

Something that processes a representation's stream of bytes into a higher-level model.

there often are layered higher-level models. given this definition a parser can also parse bytes into an "feed DOM".

you could argue that atom shouldn't be a
media type because an XML parser is all you need?

Indeed. An Atom document is an XML document conforming to the (to be defined) Atom profile. Unfortunately, it's defined differently because profiles didn't exist at the time. Yet all Atom libraries first parse the regular XML document, and then only start applying the specific Atom structural and semantic constraints.

this is not how RFC 6906 defines profiles. you may want to change reality to this, but (a) reality is different and hard to change, and (b) this would be some non-6906 profile concept to be used for this.

or you could argue
because you'd actually want a feed to be parsed into feed-level
structures, meaning that you need a parser?

But that wouldn't be a parser of the representation sent by the server. It would be a convertor from an XML in-memory model to a list of feeds.

maybe. who are we to decide how bits-on-the-wire get parsed into application models?

different levels of abstraction layered on top of each other.

Yes, and to me that lowest level is the document type, such as XML or JSON.

wouldn't the lowest level for both be unicode? i'd hope that few XML or JSON parsers implement unicode from scratch. but i don't know and i don't have to know.

i cannot see a way how to see things that way without redefining what's
out there already

I don't intend to fix the past, but rather to make it easier and more flexible to define new things in the future. So if a new HAL 2.0 comes up, that it can be defined as a profile on top of JSON,

again, that would be for a non-6906 profile concept.

for example, you could have an "XML profile" that said attributes always
must use quotes (and not apostrophes). that implies a specific parser
(feature) and has no structural implications (given your definition of
structure). wouldn't that be an acceptable profile?

It would not be a profile to me, but a media type. Hence my definition of media type as a set of [byte-level] syntactic, [model-level] structural, and semantic constraints, and a profile as only [model-level] structural and semantic constraints but not [byte-level] syntax.

seems like our profile concepts are diametrically opposed.

nothing in RFC 6906 keeps you from doing that.

Indeed, but my comment is that the phrasing /seems/ to imply that, when profiles are used across media types, they are associated with the resource instead of the representation. I suggest to change the phrasing, as this is not necessarily the case.

ok, can you maybe raise an issue for that or submit a PR? i think in most places the text is pretty clear that profiles constrain representations.

Plus, this point is still open:

        whether media types include or transclude things they may be
        built on is a pure technicality.

    With that definition, any profile that is tied to one specific
    media type, could equally be considered a media type itself,
    given that it then transcludes the first media type.

very true. you could take any profile and turn it into a media type, severing its connections with its foundation. but then you cannot conveniently treat a podcast as a feed anymore, which is why the profile concept fragments the landscape a little less.

RubenVerborgh commented 6 years ago

Alright, thanks for the discussion, @dret. I've learned that we indeed have something different in mind. The good thing is that I don't see an incompatibility with the phrasing as it currently is in RFC 6906, so I'll keep an eye on that in the future as well.

this is not how RFC 6906 defines profiles

RFC 6906 does not define a profile at the moment, and the text is compatible with the notion of a profile I propose (and I'm happy with that).

i'd hope that few XML or JSON parsers implement unicode from scratch.

That's a charset matter, and a separate concern with a separate header.

So if a new HAL 2.0 comes up, that it can be defined as a profile on top of JSON,

again, that would be for a non-6906 profile concept.

Why? It is not incompatible with anything in 6909.

ok, can you maybe raise an issue for that or submit a PR?

Done in #95.

you could take any profile and turn it into a media type, severing its connections with its foundation.

Then this is the main reason why that concept of a profile is not of any use to me. The attraction of my notion of profiles is precisely that they offer something a media type cannot. Might need another name though then. Perhaps features (as in here).

but then you cannot conveniently treat a podcast as a feed anymore, which is why the profile concept fragments the landscape a little less.

…which I why I'd want future HAL, Atom, etc. all to be profiles. Same (byte-level) parser, different application-level assumptions.

dret commented 6 years ago

On 2018-01-10 02:38, Ruben Verborgh wrote:

this is not how RFC 6906 defines profiles

RFC 6906 does not define a profile at the moment, and the text is compatible with the notion of a profile I propose (and I'm happy with that).

you keep saying that and i don't understand why. you're hunting for something i've seen people calling "schema" or "type": an added layer of abstraction, a model on top of some generic metamodel structure.

but i see that RFC 6906 is not clear enough. i'll try to change that to make sure things are easier to understand.

you could take any profile and turn it into a media type,

severing its connections with its foundation.

Then this is the main reason why that concept of a profile is not of any use to me. The attraction of my notion of profiles is precisely that they offer something a media type cannot. Might need another name though then. Perhaps /features/ (as in here https://arxiv.org/pdf/1609.07108v2.pdf).

hmmmm.... i have a really hard time imagining how your alternative notion of a profile would be any different regarding this aspect. people could easily ignore it and keep minting media types, and there would be little you could do about it (other than disliking it).

keep in mind that the main motivation for RFC 6906 was to make media types more easy to reuse and refine, so that people don't have to create media types and can create and use profiles instead. but that doesn't mean anybody can keep them from doing that, if they feel like doing it.

RubenVerborgh commented 6 years ago

RFC 6906 does not define a profile at the moment, and the text is compatible with the notion of a profile I propose (and I'm happy with that).

you keep saying that and i don't understand why.

Part a) "does not define a profile" is because RFC 6906 says "For the purpose of this specification, a profile can be described as…” but never "a profile is". Part b) "is compatible" because I cannot find a single sentence in 6906 that contradicts my interpretation.

Note that this is not changed by 4efda97908d49c3ddbaa969f5e00a782eea14566, whose commit message says "trying to make it as clear as possible that a profile is not a schema” but the actual RFC text does not state that fact. It says "an easy way to conceptualize profiles is […]", but that does not conclusively say whether or not a profile can be a schema. The clearest way IMHO is to write "a schema is not a profile".

I'm not trying to be pedantic here, but either RFC 6906 should use exact wording to say "a profile is" and "a profile is not", or either many interpretations—including mine—will be compatible. If the latter is on purpose, fine (and actually my preference), but then we should not assume a strict definition of a profile based on RFC 6906.

you're hunting for something i've seen people calling "schema" or "type"

A schema seems to imply something much more strict to me. Profiles can be really light constraints.

people could easily ignore it and keep minting media types, and there would be little you could do about it (other than disliking it).

Obviously.

But the situation now is that people cannot do profiles at all (in the way we need it, with multiple profiles per resource, conneg etc.), so are forced to keep minting media types. I just want to offer an alternative, but I can't and won't force anybody.

keep in mind that the main motivation for RFC 6906 was to make media types more easy to reuse and refine, so that people don't have to create media types and can create and use profiles instead.

Yeah, but the only distinction between a profile (based on a media type) and a media type seems then just whether somebody decides to call it a profile or a media type, especially given that you consider transclusion in a media type definition a technicality. Then it seems also a technically whether we define something as a profile or a media type, really.

Nonetheless, this main motivation is something we share, so it is in a sense strange that we seems to have arrived at very different conclusions from it.

I seem to be more radical in that everything that is JSON (XML) should for me—in an ideal future—just have a media type of application/json (application/xml), no subtypes required. Instead, the response indicates compliances with one or multiple profiles, which allows the client to make additional assumptions about the shape and semantics of that JSON. This recognizes that fact that all processors of JSON (XML) subtypes indeed start with a JSON (XML) parser, which I do not consider a technicality since I have not heard about a single exception.

A secondary motivation for me is that the overwhelming majority of application/json API responses are underspecified: clients make many more assumptions than only application/json. Profiles can make these assumptions explicit, without having to resort to specific media types such as application/vnd.my+json that have no formal relation to application/json. Instead, they are marked as application/json tagged with profile/a and profile/b, which tells the client "use a JSON parser" and "you can make additional assumptions a and b".

handrews commented 6 years ago

I cannot find a single sentence in 6906 that contradicts my interpretation.

I'm really confused by this (and not just here and with you, @RubenVerborgh, I've encountered it from others at the JSON Schema project and elsewhere).

We have the RFC 6906 author telling us the intent of the RFC. And admitting that it needs clarification and working on the clarification. And I agree that having the language be more definitive would help and reduce the tendency of people ot re-interpret this RFC however they please.

But given the intended defintion, f we don't find his definition of "profile" useful because we need a somewhat similar but ultimately different concept or behavior, why are we trying to tell him what "profile" means? Why not just make up our own link relation / media type parameter / http preference that does what we want? That is why I am proposing a "schema" relation/profile/preference.

@RubenVerborgh I do think you bring up really interesting points about "primary" media types vs structures suffixes vs profiles vs schemas. Which I need to think more on as I just woke up and the caffeine hasn't entirely kicked in yet. I think I like the distinctions you are proposing, whether they work with the "profile" terminology or need a new name.

RubenVerborgh commented 6 years ago

why are we trying to tell him what "profile" means?

I wasn't—just trying to understand :smile: Conclusion so far: we apparently mean something different, even though 6906 doesn't state so.

Why not just make up our own link relation / media type parameter / http preference that does what we want?

I always try to reuse first. And I still can, if the phrasing of 6906 doesn't fundamentally change.

That is why I am proposing a "schema" relation/profile/preference.

Schema is too narrow, I think.

I do think you bring up really interesting points about "primary" media types

I like that notion of "primary"!

Which I need to think more on as I just woke up and the caffeine hasn't entirely kicked in yet. I think I like the distinctions you are proposing, whether they work with the "profile" terminology or need a new name.

More at https://ruben.verborgh.org/articles/fine-grained-content-negotiation/ if you like. Open to other terminology!

dret commented 6 years ago

On 2018-01-11 08:42, Henry Andrews wrote:

I cannot find a single sentence in 6906 that contradicts my
interpretation.

I'm really confused by this (and not just here and with you, @RubenVerborgh https://github.com/rubenverborgh, I've encountered it from others at the JSON Schema project and elsewhere).

thanks for channeling my confusion/frustration, @handrews.

We have the RFC 6906 author telling us the intent of the RFC. And admitting that it needs clarification and working on the clarification. And I agree that having the language be more definitive would help and reduce the tendency of people ot re-interpret this RFC however they please.

the latest commits should be pretty clear. they say that it's not ok to add a new abstraction layer with a profile, and that's it's only ok to incrementally add to an existing one. i have a hard time seeing what's hard to understand there.

https://github.com/dret/I-D/commit/4efda97908d49c3ddbaa969f5e00a782eea14566#diff-6023bdc7ab5f5743f9447d322b3846f4

But given the intended defintion, f we don't find his definition of "profile" useful because we need a somewhat similar but ultimately different concept or behavior, why are we trying to tell him what "profile" means? Why not just make up our own link relation / media type parameter / http preference that does what we want? That is why I am proposing a "schema" relation/profile/preference.

that makes sense to me, if you want to signal schemas. @RubenVerborgh's vision seems a bit nebulous so far: some feature that's adding complete new abstraction layers, but it's not a schema. then how does one know how anything is represented?

@RubenVerborgh https://github.com/rubenverborgh I do think you bring up really interesting points about "primary" media types vs structures suffixes vs profiles vs schemas. Which I need to think more on as I just woke up and the caffeine hasn't entirely kicked in yet. I think I like the distinctions you are proposing, whether they work with the "profile" terminology or need a new name.

i'd be more than happy to help with whatever else may crystallize. me may have a good opportunity here with the rewrite of "profile" and some momentum behind something that maybe could be made nicely complementary instead of competing.

dret commented 6 years ago

On 2018-01-11 09:57, Ruben Verborgh wrote:

why are we trying to tell him what "profile" means?

I wasn't—just trying to understand 😄 Conclusion so far: we apparently mean something different, even though 6906 doesn't state so.

i think i am simply giving up here. the draft is as clear as i can possible make it in saying that it's not intended to be used for establishing new abstraction layers.

Why not just make up our own link relation / media type parameter /
http preference that does what we want?

I always try to reuse first. And I still can, if the phrasing of 6906 doesn't fundamentally change.

that would be an odd interpretation of "reuse", after the discussions we've had so far.

RubenVerborgh commented 6 years ago

the latest commits should be pretty clear.

Truth is, you never know whether a text is clear until you ask others. Given that there are still no exact definitions, I propose to verify this assumption.

How about we ask a couple of experts to explain, based on the current text, their understanding of a profile? We could even make this very simple with a set of yes/no questions.

If they understand, we can conclude the text is clear.

they say that it's not ok to add a new abstraction layer with a profile, and that's it's only ok to incrementally add to an existing one. i have a hard time seeing what's hard to understand there.

For one, when is something an "abstraction layer" and when isn't it?

But I also don't see how that statement changes anything we have discussed above (and it might very well be my own inability to understand, hence my suggestion to ask others).

@RubenVerborgh's vision seems a bit nebulous so far: some feature that's adding complete new abstraction layers, but it's not a schema. then how does one know how anything is represented?

Not necessarily a schema—it can be.

“My” profile is any set of (high-level) structural or semantic constraints. Let me clear up the nebula by making this very concrete.

Quick fictitious examples of profiles:

Note how multiple profiles can apply to the same resource. For instance, both schema-org-book and main-title could apply to a JSON-LD document.

For real-world examples, consider that things such as Atom and HAL were defined as profiles rather than new MIME types. Especially the case of HAL is interesting here.

Current situation

Proposed situation (I know we can't change the past, but it's more of an “what if HAL were invented after profiles” thing for illustrative purposes)

So the client will see this as a JSON document, that has the HAL structural properties and semantics (_links and _embedded ) as well as the application structure and semantics (currentlyProcessing and shippedToday).

Moreover, both can be reused independently of each other.

may have a good opportunity here with the rewrite of "profile" and some momentum behind something that maybe could be made nicely complementary instead of competing.

Yes, and I honestly don't think we're that far. We have different ideas of what a profile should be, but it is not specified too strictly (as is the case now), it works for both.

dret commented 6 years ago

i'd be more than happy to ask others, if that is what it takes to resolve this issue. feel free to reach out and see what we get in response!

handrews commented 6 years ago

@dret regarding:

the latest commits should be pretty clear. they say that it's not ok to add a new abstraction layer with a profile, and that's it's only ok to incrementally add to an existing one. i have a hard time seeing what's hard to understand there.

the link you supplied is adding this sentence:

An easy way to conceptualize profiles is to imagine both a media type and a profile having a schema (even though none of them need to have one). If the profile schema is a refinement/augmentation of the media type schema, and if any valid profile instance is a valid media type instance, then the profile indeed is one according to the working definition used in this specification.

I love abstract stuff. I prefer abstract descriptions. As you may have noticed over at JSON Schema, every time someone demands a concrete example I wail and gnash my teeth and bemoan that no one likes or understands my example anyway.

That said, I really cannot wrap my head around this at all. A profile is either an augmentation or refinement of a media type? Can it be both? At that point is there even much restriction to it at all? You also bring in the word "schema" which has proven confusing in this context as well. The whole thing is kind of circular and when I try to dig into it there's just not a lot of there there for me. (somewhere, someone who has struggled with my completely abstract ramblings is laughing their head off right now.)

As much as I hate to be That Guy, I think an example is in order. And perhaps more importantly, a set of counter-examples. There is value in nebulous definitions, and sometimes the easiest way to achieve that is to set some clear markers and say "these are concepts that often come up that are firmly outside of the definition."

There are three words for potentially similar concepts floating around here, all of which at least could be some sort of refinement or augmentation of an existing media type:

Can we put some boundaries around what is appropriate for each? Rather than go back and read prior definitions, I'm going to write down my current intuition off the top of my head. It will likely be hilariously misguided, but a lot of the confusion around RFC 6906 is that people read it, develop an intuition that they don't see contradicted (as @RubenVerborgh noted) and run with it. The attitude is that everything that is not forbidden is allowed.

I'm going to stick to JSON just because I have more options to reference there that I understand pretty well. And on the topic of this likely being misguided, @dret I am not trying to impose any of this as a definition for RFC 6906bis. I just want to reset things with another starting point that comes from someone's intuition rather than the needs of another project that is hunting around for a usable concept.

Structured Suffix Media Types

A structured suffix allows you to work directly with media type-based content negotiation. They're rather heavyweight to get into the standards tree, but the vendor tree is more accessible. They feel like the most coarse-grained solution, even though some (like application/problem+json, application/json-patch+json) have very specific purposes and structure. But others are very general, adding a broad concept (hyperlinking with application/json+hal, semantic identification with application/ld+json).

Structured suffixes make the most sense to me for non-substitutable alternatives

These use cases involve selecting different ways of achieving the same goal. Adding hypermedia with application/hal+json vs application/vnd.siren+json vs application/vnd.api+json etc. While you can put a hypermedia abstraction over top of all of these, there are significant pros and cons to each approach to solving the hypermedia problem. You can't swap them with each other, and none degrade to each other. They all degrade to plain application/json.

Similarly application/merge-patch+json vs application/json-patch+json for two different ways to express how to edit another JSON document. They each have advantages, they are used for the same purpose, but they are not compatible with each other.

It's a little harder for me to fit application/problem+json into this view, as I'm not aware of other error-reporting systems. I think the reason that it feels right as a structured suffix is that it occupies a very generic role in hypermedia system communication.

In fact all of these examples do, as does application/json+ld. A full-featured system needs to be able to express application semantics, send editing instructions, report errors, and include hyperlinks. Hmm... I like this concept even if I'm not confident is sufficient or even accurate.

Schemas

To me, schemas are the most specific concept. Just as structure suffix media types can be very specific (problem+json), schemas can be very generic (the JSON Schema meta-schema, for instance).

But if I want to express the concept of a DNS record as represented in a REST API, that's definitely not right as a structure suffix media type. The GitHub API notwithstanding, it's far too specific.

Unlike problem+json or merge-patch+json it does not play a role in generic communications. It is for representing a specific thing. Considering the meta-schema, arguably schemas play a generic role. But JSON Schema is also a structured suffix media type, application/schema+json. The meta-schema expresses which variation of that media type we're using (we'll come back to this in the Profile section).

So I feel that something that identifies a document as representing a specific concept in a specific way is a schema. Schema-described things do not occupy generic roles in communication, they are descriptions and identifications of what sort of things are are being communicated.

Profiles

So where does that leave us with profiles? I feel like they are kind of in the middle, although I am not all that confident that my view is shared within this conversation :-)

Things that feel like profiles to me are things like the expired I-D for a canonicalized form of JSON (that @dret might have used as an example somewhere recently? I've lost track). Or I-JSON, which I know @dret has mentioned and even uses the word "profile" in its description.

Both of these profile candidates allow all interoperable uses of JSON, and just avoid problematic or confusing but syntactically correct documents. That'd different from both playing a generic role in communication and from identifying concrete sets of things being communicated. These are refinements on how the document is structured to allow for more assumptions to be made during processing.

It wouldn't make sense to make new media types for canonicalized JSON or I-JSON. They don't add any semantics, they just restrict the syntax to something tidier, and remove ambiguous / non-interoperable / undefined semantics.

I mentioned that I'd come back to JSON Schema meta-schemas. I can see the as schemas, but I can also see them as profiles of application/schema+json because JSON Schemas ignore what they don't understand. A meta schema allows you to start understanding parts of a JSON Schema document while continuing to ignore those parts that are unrecognizable. I'm not quite sure where I"m going with this paragraph. I think I've surprised myself by saying that schemas are not profiles, but maybe meta-schemas are?

Perhaps this is a good place to stop. It's getting late-ish here and I've rambled my way into a corner. I hope that even if all of these ideas and proposed roles and definitions are completely off base, that by reacting to them we can start to put some boundaries around these concepts somehow.

dret commented 6 years ago

On 2018-01-23 08:02, Henry Andrews wrote:

That said, I really cannot wrap my head around this at all. A profile is either an augmentation or refinement of a media type? Can it be both?

to me these are kind of the same things, so yes. podcasts add new fields, so you might say they "augment" feeds. whatever it is that happens, it doesn't define a new thing. it adds to what's there.

At that point is there even much restriction to it at all? You also bring in the word "schema" which has proven confusing in this context as well.

yup, true. but people seem to want to see it here.

As much as I hate to be That Guy, I think an example is in order. And perhaps more importantly, a set of counter-examples. There is value in nebulous definitions, and sometimes the easiest way to achieve that is to set some clear markers and say "these are concepts that often come up that are firmly outside of the definition."

example: feed and podcast, where a podcast simply is a special kind of feed, and thus each podcast is a feed.

counter-example: XML and atom: atom adds a layer on top of XML, and you cannot meaningfully say "atom is XML". it is represented via XML, but when you work with atom what matters are atom abstractions and not XML abstractions anymore.

There are three words for potentially similar concepts floating around here, all of which at least /could/ be some sort of refinement or augmentation of an existing media type:

  • Structured suffix media types

well, that seems to be squarely in the "added layer" camp: by saying application/atom+xml, you make it clear that while the representation is XML, the actual application-level type is atom.

  • Profiles

that's the other thing, for which it seems there still needs to be a better description than augmentation/refinement. what would work best for you for something that doesn't add a layer of abstraction, but instead adds to one that's already there?

  • Schemas

that to most is a way how you can validate a document. it's a more mechanical construct in the sense that for example, for any document type there might even be multiple schemas, either in terms of various aspects (DSDL), or in terms of schema strictness (HTML loose/strict).

Can we put some boundaries around what is appropriate for each? Rather than go back and read prior definitions, I'm going to write down my current intuition off the top of my head. It will likely be hilariously misguided, but a lot of the confusion around RFC 6906 is that people read it, develop an intuition that they don't see contradicted (as @RubenVerborgh https://github.com/rubenverborgh noted) and run with it. The attitude is that everything that is not forbidden is allowed.

to some extent that's unavoidable. whatever you're doing, it will become somebody nail they're hammering on, because that's what they see. but i agree that the particular schema/type hammer may be something that should be mentioned as not being the right thing to use here.

  Structured Suffix Media Types

A structured suffix allows you to work directly with media type-based content negotiation. They're rather heavyweight to get into the standards tree, but the vendor tree is more accessible. They feel like the most coarse-grained solution, even though some (like |application/problem+json|, |application/json-patch+json|) have very specific purposes and structure. But others are very general, adding a broad concept (hyperlinking with |application/json+hal|, semantic identification with |application/ld+json|).

hm. to me the most important thing to mention here is that in this model, you're always minting new media types. the structured suffix is just a model to make your design layers a bit more transparent, but to be honest i have never seen building actually machinery around this, other than being happy about the fact that the name is a little bit more descriptive than a completely opaque identifier.

Structured suffixes make the most sense to me for non-substitutable alternatives

maybe that's because they are proper media types?

Similarly |application/merge-patch+json| vs |application/json-patch+json| for two different ways to express how to edit another JSON document. They each have advantages, they are used for the same purpose, but they are not compatible with each other.

yes, because they are different media types. they happen to share the same representation foundation, but that's just interesting to see and of no practical value.

It's a little harder for me to fit |application/problem+json| into this view, as I'm not aware of other error-reporting systems. I think the reason that it feels right as a structured suffix is that it occupies a very generic role in hypermedia system communication.

the only reason it has one because it's a json format. there also is application/problem+xml (https://tools.ietf.org/html/rfc7807#section-6.2) which is an XML variant. the media type names simply provide an indication that they are the same model (abstraction layer) on top of different representations.

In fact all of these examples do, as does |application/json+ld|. A full-featured system needs to be able to express application semantics, send editing instructions, report errors, and include hyperlinks. Hmm... I like this concept even if I'm not confident is sufficient or even accurate.

to me all these discussions imply a precision and rigor in the media type system that is not there at all. people chase the dream that everything is well-defined and well-related in a completely machine-processable way, and that's never been the case and i am certainly not holding my breath.

  Schemas

So I feel that something that identifies a document as representing a specific concept in a specific way is a schema. Schema-described things do not occupy generic roles in communication, they are descriptions and identifications of what sort of things are are being communicated.

a schema is an implementation vehicle. there can be many schemas for one media type. again, assuming that a schema is a complete description or a semantically complete model is very far from how things are, and very likely it will remain like this. again i am not holding my breath.

  Profiles

So where does that leave us with profiles? I feel like they are kind of in the middle, although I am not all that confident that my view is shared within this conversation :-)

they are identifiers. they identify the fact some a feed claims to be a podcast. they do not link to a schema. they do not link to anything. their mere presence is all there is. in that regard they are like media types, which also are pure identifiers.

Things that feel like profiles to me are things like the expired I-D for a canonicalized form of JSON https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00 (that @dret https://github.com/dret might have used as an example somewhere recently? I've lost track). Or I-JSON https://tools.ietf.org/html/rfc7493, which I know @dret https://github.com/dret has mentioned and even uses the word "profile" in its description.

yes, these things could identify themselves as profiles if they minted a profile URI and then used that as a signal.

Both of these profile candidates allow all interoperable uses of JSON, and just avoid problematic or confusing but syntactically correct documents. That'd different from both playing a generic role in communication and from identifying concrete sets of things being communicated. These are refinements on how the document is structured to allow for more assumptions to be made during processing.

exactly.

It wouldn't make sense to make new media types for canonicalized JSON or I-JSON. They don't add any semantics, they just restrict the syntax to something tidier, and remove ambiguous / non-interoperable / undefined semantics.

well, you could very well mint media types. but that would probably defeat the purpose of saying that "I-JSON is still JSON, but a specific way of using it". a profile is a lightweight way of expressing that, without incurring the heavyweight change of "media type identity".

I mentioned that I'd come back to JSON Schema meta-schemas. I can see the as schemas, but I can also see them as profiles of |application/schema+json| because JSON Schemas ignore what they don't understand. A meta schema allows you to start understanding parts of a JSON Schema document while continuing to ignore those parts that are unrecognizable. I'm not quite sure where I"m going with this paragraph. I think I've surprised myself by saying that schemas are not profiles, but maybe meta-schemas are?

i need to understand more about schemas and meta-schemas to be able to comment on this. if a meta-schema is a schema, then maybe they indeed they are profiles. but i would have to dig deeper.

Perhaps this is a good place to stop. It's getting late-ish here and I've rambled my way into a corner. I hope that even if all of these ideas and proposed roles and definitions are completely off base, that by reacting to them we can start to put some boundaries around these concepts somehow.

it's all good, and thanks for spending the time. i think all of this could much better we discussed f2f, it seems like these long threads help little to get us closer to a shared understanding. but let's keep trying, and thanks for doing it!

RubenVerborgh commented 6 years ago

i'd be more than happy to ask others, if that is what it takes to resolve this issue. feel free to reach out and see what we get in response!

I have created the following form: https://goo.gl/forms/Ql9rvnYigQvHXWZE3 The purpose of this form is to find out whether or not a profile and the difference with media types and schemas are clear. If they are, then I'll accept it must just be me. If they are not, then I think we should look into clarifying the document.

Any edits or suggestions?

If not, I propose to send it on the mailing lists.

dret commented 6 years ago

On 2018-02-16 11:24, Ruben Verborgh wrote:

Any edits or suggestions? If not, I propose to send it on the mailing lists.

feel free to do so. i just hope we'll get constructive comments, otherwise we'll still be where we are now.

RubenVerborgh commented 6 years ago

i just hope we'll get constructive comments,

Good point, I'll ask for suggestions as well.

RubenVerborgh commented 6 years ago

Unfortunately, the survey only attracted 4 responses so far. I have attached the anonymized results (I have shared the full results with you). I can gather more feedback if you want by directly mailing people. Let me know if that would be helpful.

That said, these results seem to confirm my suspicions about the text being not fully clear. For instance, 1 person did not know whether there is a difference between media types and profiles according to the RFC, and only 1 person found the RFC to explicitly state what that difference is.

I know you were not just looking for feedback on clarity, but also for constructive feedback—and there are suggestions in the responses. The most constructive feedback that I can give is to write in the RFC directly and literally what you mean. As an example, I refer to my earlier message that the commit message of 4efda97908d49c3ddbaa969f5e00a782eea14566 is crystal clear ("a profile is not a schema"), whereas this direct phrasing is nowhere to be found in the actually committed diff.