how to process unknown link parameters

dret commented 7 years ago

https://github.com/dret/I-D/issues/74#issuecomment-322427054 suggests to require ignoring unknown link parameters, and this seems to be important enough to discuss as a standalone issue. it seems like this would seriously impact the way how links can be passed around by intermediaries. for example, an intermediary with the job of serializing HTTP Link headers would have to drop all link parameters that it does not know, instead of being able to serialize them. furthermore, only link parameters that have a defined JSON serialization could be serialized, all others by definition would have to be dropped. this seems like a rather drastic departure from the usual web model of passing information around, such as for example HTTP's requirement to pass on headers (instead of silently dropping them).

hvdsomp commented 7 years ago

It could also mean:

pass around what you have received if you are a syndicator
ignore what you don't understand if you're actually consuming

dret commented 7 years ago

another aspect to consider here is that is a link parameter is unknown, it could be repeatable. what would that mean for a model where unknown link parameters are not dropped? it might be something like requiring serialization into some delimiter-separated list within the string-value, such as whitespace, comma, or semicolon.

dret commented 7 years ago

On 2017-08-15 14:18, Herbert Van de Sompel wrote:

It could also mean:

pass around what you have received if you are a syndicator

ignore what you don't understand if you're actually consuming

but that's what's happening anyway. if i am consuming and need to understand a link parameter, then i am of course ignoring the ones that i do not understand.

but if my job in the pass-around role is to pass around links that i see in link headers, and i am passing them around in their JSON serialization, how do i do this?

hvdsomp commented 7 years ago

Then you need to understand, because you consume prior to re-serializing.

dret commented 7 years ago

On 2017-08-15 14:29, Herbert Van de Sompel wrote:

Then you need to understand, because you consume prior to re-serializing.

so you define an intermediary as a component that does not parse or serialize? that's a bit unusual. in that case the notion of an intermediary is not very helpful. regardless of terminology, what you're saying is that such a component (regardless of its name) that is tasked with serializing HTTP Link header fields would not be allowed to serialize any unknown link parameters. that would make it impossible to have generic components for monitoring, logging, and other tasks that otherwise could benefit from the JSON serialization.

asbjornu commented 7 years ago

Do you need to serialize what you have parsed (and understood)? Isn't it possible to parse, deserialize and then just pass the untouched source on to the next processor? Then the parser can understand whatever it needs to in order to do what it wants without affecting other processors and possibly screwing up by removing or wrongly serializing unknown parameters.

hvdsomp commented 7 years ago

Just for the record: I don't know where the language about ignoring unknown attributes in the I-D came from; I don't remember adding it. But, honestly, I think it makes a lot of sense. Let's not forget we are talking about third party resources that pass on links that pertain to other resources. Maybe they should know what they are doing? For example, in most cases, they will need to add anchor attributes because the links don't pertain to themselves.

I think that the interesting bit about the suggestion I made re handling repeatable attributes in JSON is that it is immediately clear from a payload what is repeatable and what is not. So, if we can assume that a resource that uses repeatable attributes serializes appropriately in JSON, then downstream applications can serialize appropriately both in JSON and in the native format. Unfortunately, not the other way around (native to JSON) because the native format has no way to express whether attributes are repeatable. In this case, the consuming application needs to be in the know to transform from native to JSON.

dret commented 7 years ago

On 2017-08-15 14:50, Asbjørn Ulsberg wrote:

Do you need to serialize what you have parsed (and understood)? Isn't it possible to parse, deserialize and then just pass the untouched source on to the next processor? Then the parser can understand whatever it needs to in order to do what it wants without affecting other processors and possibly screwing up by removing or wrongly serializing unknown parameters.

let's chat at RESTfest! my current concern is about a scenario where an intermediary monitors HTTP traffic, looks at Link headers, and is attempting to serialize those as JSON so that they can be accessed as JSON data. the question is: can unknown link parameters be represented in JSON? one option is to say "no". another option is to say "yes", but then there needs to be a definition of the "how" as well, and one that does not require knowledge of specific link parameters.

hvdsomp commented 7 years ago

That is indeed a pertinent issue, see my above comment.

dret commented 7 years ago

On 2017-08-15 14:52, Herbert Van de Sompel wrote:

Just for the record: I don't know where the language about ignoring unknown attributes in the I-D came from; I don't remember adding it. But, honestly, I think it makes a lot of sense. Let's not forget we are talking about third party resources that pass on links that pertain to other resources. Maybe they should know what they are doing? For example, in most cases, they will need to add anchor attributes because the links don't pertain to themselves.

we did change the focus of the media types to be scenario-agnostic (#70), so this discussion is about any usage of the media types.

I think that the interesting bit about the suggestion I made re handling repeatable attributes in JSON is that it is immediately clear from a payload what is repeatable and what is not.

that's true for JSON but not true for native. so if i am attempting to serialize data i received via native into JSON, then i won't know whether that value is repeatable or; all i can tell from looking at a link is whether the value is repeated.

So, if we can assume that a resource that uses repeatable attributes serializes appropriately in JSON, then downstream applications can serialize appropriately both in JSON and in the native format. Unfortunately, not the other way around (native to JSON) because the native format has no way to express whether attributes are repeatable. In this case, the consuming application needs to be in the know to transform from native to JSON.

yes, and that's at the heart of this issue. in 99.9% of cases on the web, the starting point will be native (when monitoring web traffic and looking at headers), so this is an important part of the puzzle.

mamund commented 7 years ago

just dropping in here...

JSON's limits on the uniqueness of key elements is something i deal w/ often in designing representations. what i do is create anonymous objects that can appear within an array.

for example:

{
  ...
  "params": [
    {"name":"url","value":"..."},
    {"name":"rel","value":"..."},
    {"name":"x","value":"123"},
    {"name":"x","value":"abc"},
  ]
  ...
}

this approach can be applied to all parameters (above) or just the "unknown" parameters":

{
  "url" : "...",
  "rel" : "...",
  "params": [
    {"name":"x","value":"123"},
    {"name":"x","value":"abc"},
  ]
  ...
}

hvdsomp commented 7 years ago

Very interesting. Thanks for the insight, @mamund

mamund commented 7 years ago

@hvdsomp

no problem.

also, FWIW, i think it is wise to follow the HTTP proxy processing pattern (from RFC1945 Unrecognized header fields should be ignored by the recipient and forwarded by proxies. here.

it's possible that you many be thinking more along the lines of HTML: “The HTML parser will ignore tags which it does not understand, and will ignore attributes which it does not understand…”

however, i think what you're implementing here is more likely to be used by intermediaries (proxies) rather than clients (e.g HTML browsers).

so, i would not drop things we don't understand -- just pass them along "as-is".

dret commented 7 years ago

On 2017-08-15 16:21, Mike Amundsen wrote:

@hvdsomp https://github.com/hvdsomp also, FWIW, i think it is wise to follow the HTTP proxy processing pattern (from RFC1945 /Unrecognized header fields should be ignored by the recipient and forwarded by proxies./ here.

it's possible that you many be thinking more along the lines of HTML: /“The HTML parser will ignore tags which it does not understand, and will ignore attributes which it does not understand…”/

however, i think what you're implementing here is more likely to be used by intermediaries (proxies) rather than clients (e.g HTML browsers).

good references! and yes, when things are not forwarded then we don't really have to think about how/if they are represented. but if they are forwarded, that's what we're discussing here. i'd definitely prefer the HTTP behavior here (we're talking about HTTP link headers after all), instead of mandating to ignore unknown fields.

hvdsomp commented 7 years ago

Sorry to repeat myself, but links will in most scenarios not just be forwarded because the link context will change when they are being forwarded. Only for links that have explicit link context and link target would it be possible to blindly forward.

On Aug 16, 2017, at 00:27, Erik Wilde notifications@github.com wrote:

On 2017-08-15 16:21, Mike Amundsen wrote:

@hvdsomp https://github.com/hvdsomp also, FWIW, i think it is wise to follow the HTTP proxy processing pattern (from RFC1945 /Unrecognized header fields should be ignored by the recipient and forwarded by proxies./ here.

it's possible that you many be thinking more along the lines of HTML: /“The HTML parser will ignore tags which it does not understand, and will ignore attributes which it does not understand…”/

however, i think what you're implementing here is more likely to be used by intermediaries (proxies) rather than clients (e.g HTML browsers).

good references! and yes, when things are not forwarded then we don't really have to think about how/if they are represented. but if they are forwarded, that's what we're discussing here. i'd definitely prefer the HTTP behavior here (we're talking about HTTP link headers after all), instead of mandating to ignore unknown fields. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

dret commented 7 years ago

On 2017-08-15 16:43, Herbert Van de Sompel wrote:

Sorry to repeat myself, but links will in most scenarios not just be forwarded because the link context will change when they are being forwarded. Only for links that have explicit link context and link target would it be possible to blindly forward.

any kind of HTTP monitoring solution will be interested in links. these will be captured within the context of the HTTP request.

any kind of HTTP proxying solution forwarding links from native to JSON will rewrite links to add anchors, iff the request context changes. if it does and the rewriting happens, it would be odd if we required to drop some link parameters from the original data.

dret commented 7 years ago

just as a reminder: if we make ignoring unknown link parameters mandatory, that means we're creating a representation that critically depends on schema knowledge. that's neither a popular nor a usually successful design on the web. in practical terms:

Link: </>; rel="http://example.net/foo"; this="x"; might="y"; be="z"; important="right?"

then must be serialized like this:

{"href":"/","rel":"http://example.net/foo"}

unless the serializer knows all the link parameters and they all have defined JSON serializations.

asbjornu commented 7 years ago

Ok, so the problem at hand is that information is lost when converting from Link to JSON and back again. What I don't really understand is that this is something someone wants to do.

I can understand the creation of a JSON representation of the same abstract model as that of Link, but why will the two concrete serializations be used by a single processor? What is the use-case of converting from Link to JSON and back again?

dret commented 7 years ago

On 2017-08-16 04:04, Asbjørn Ulsberg wrote:

I can understand the creation of a JSON representation of the same abstract model as that of |Link|, but why will the two concrete serializations be used by a single processor? What is the use-case of converting from |Link| to JSON and back again?

a rather typical use case we are looking at is that you have an HTTP-focused component such as a proxy, which captures and exposes HTTP information to JSON-focused components such as applications that want to analyze and work on HTTP traces and logs without the need to parse specific syntaxes.

if our representation is categorically unable to represent links without schema-awareness, we make it less useful. i have to admit that i am struggling a bit here to understand the downsides of a schema-agnostic representation (other than opportunities for schema-specific JSON optimization).

asbjornu commented 7 years ago

exposes HTTP information to JSON-focused components such as applications that want to analyze and work on HTTP traces and logs without the need to parse specific syntaxes.

Ok. This requires conversion from Link to JSON. But not back again from JSON to Link, am I correct?

dret commented 7 years ago

On 2017-08-16 14:43, Asbjørn Ulsberg wrote:

Ok. This requires conversion from |Link| to JSON. But not back again from JSON to |Link|, am I correct?

true for this scenario. i don't have such a use case somewhere now, but i could easily imagine tooling that accepts JSON-structured links (which might be more convenient for programmers to manage) and then injects those into HTTP as link headers.

such tooling could even do this on a case-by-case basis as @hvdsomp's scenarios: create link headers if there are not too may links, or add a "linkrel" link and provide as application/linkrel+json resource if there are too many links for inline representation in the header.

asbjornu commented 7 years ago

Yes, but that will only require conversion from JSON to Link, right? It's only the full roundtrip that is problematic, afaict, and I can't think of a good use-case for a full roundtrip of the two serializations.

dret commented 7 years ago

On 2017-08-16 14:59, Asbjørn Ulsberg wrote:

Yes, but that will only require conversion from JSON to |Link|, right? It's only the full roundtrip that is problematic, afaict, and I can't think of a good use-case for a full roundtrip of the two serializations.

same here, for now. but if we have reasonable scenarios for both directions, that gives us a good starting point, or not? what would be changed if we had one scenario requiring a full roundtrip?

hvdsomp commented 7 years ago

The lossless round trip is enabled by the serialization proposal I submitted earlier today. I am not sure what the problem is anymore.

dret commented 7 years ago

suggested resolution for this issue:

as initially suggested, the representation does not require schema information and there is no requirement to ignore unknown parameters.
all parameters are therefore treated uniformly.
74 handles the specifics of how they are represented.

hvdsomp commented 7 years ago

Agreed. As far as I am concerned this issue can be closed.

dret / I-D

how to process unknown link parameters #76

74 handles the specifics of how they are represented.