fcrepo / fcrepo-specification

Fedora API Specification
Apache License 2.0
17 stars 15 forks source link

Requiring support for `Content-Type: message/external-body` #37

Closed emetsger closed 7 years ago

emetsger commented 7 years ago

In section 1.3, HTTP POST:

Implementations MUST support Content-Type: message/external-body extensions for request bodies for HTTP POST that would create LDP-NRs. This content-type requires a complete Content-Type header that includes the location of the external body, e.g Content-Type: message/external-body; access-type=URL; URL=\"http://www.example.com/file\", as defined in [rfc2017].

I'm curious as to the motivation for calling out this specific content type? I initially read it as: a Fedora implementation, upon receiving a POST with Content-Type: message/external-body would dereference the URL and ingest the content of that URL (as an LDP-NR?). Later I understood that the use case was to support the idea of Fedora 3 external data streams.

I wonder if it would be better for this to be a MAY, instead of a MUST, or simply remove this paragraph all together? It seems odd to me that the specification requires all Fedora implementations to carry over a use case from Fedora 3. Or, I may just not fully understand the motivation.

awoods commented 7 years ago

The "external datastream" functionality is something that many Fedora 4 installations depend on. I see the specification of this interaction playing the beneficial role of defining a common approach for accomplishing redirection to external content. It would be interesting to discuss the possibility of downgrading this requirement from a MUST to a MAY ...but since this represents a fundamental use of Fedora, my inclination is to keep the wording as-is.

ajs6f commented 7 years ago

"external datastream" is the wrong term. It is confusing the issue. An external datastream, in Fedora 3, when retrieved, streamed through the repo allowing authZ to be employed (or disseminators to bind). There is nothing like this in F4 (the "federation" functionality was as close a match as there was).

ajs6f commented 7 years ago

We're talking about https://wiki.duraspace.org/display/FEDORA4x/External+Content-- if that resembles anything in Fedora 3, it would be Redirect Datastreams. Does that make the purpose more clear? No bits flow through the repo itself on retrieval of such a bitstream. It supports the extremely common use case of content outside the repo, the location of which is managed by the repo.

ajs6f commented 7 years ago

If we do that, people will complain that they have to follow an extra hop. The question arises: why is no one complaining about the issue you raise? Perhaps it's not hard enough to managed redirect behavior in a client to annoy people? Don't know.

ajs6f commented 7 years ago

That's only our own code. That doesn't mean anything. If we held ourselves to the standards to which we hold others, we'd never get anywhere.

I don't have really strong opinions about this. I leave it to @awoods to comment on how serious an annoyance the lack of a facility like this would be perceived as by users.

birkland commented 7 years ago

@acoburn Ha, I was just about to link to that issue. I'm beginning to question how well how such redirects interoperate with the LDP spec at all. For example, the resource at the other end of the redirect may or may not be an LDPR, which section 4.4.1.1 says an LDP-NR MUST be. Could that confuse LDP clients?

emetsger commented 7 years ago

@acoburn yes, that was more what I was thinking: is there a "LDP-way" of doing things?
@ajs6f @awoods thanks for the clarification!

It seems like there's at least two different patterns for obtaining externally-hosted LDPNRs, and it just seemed odd to me that the spec used MUST in this section.

ajs6f commented 7 years ago

I think it's more fine-grained that that. There's managing the URI, there's managing the dereferencing, and there's managing the bits. We're trying to let people decide to give some but not all of those things to the repo.

ajs6f commented 7 years ago

@birkland We can make that more clear, if we want to keep this. We can say that the external-body thing doesn't have to be an LDPR at all, if we want to. It's not governed by LDP, it's our extension.

birkland commented 7 years ago

@ajs6f I think the spec would need to do something like that. The worrisome part to me is that the LDP spec defines the notion of containment as fundamentally a relationship between LDPRs:

Containment The relationship binding an LDPC to LDPRs whose lifecycle it controls and is aware of.

So if these external body resources are not LDPRs, then as-written the LDP spec would suggest that containment is not an appropriate concept to apply to them (i.e. a container can't contain them via ldp:contains). We'd need to relax that part, I think?

ajs6f commented 7 years ago

I think the relationship is between resources, but it is accessed via URIs (and because this is HTTP, only via URIs). So the fact that we are dealing with two different URIs (the one requested and the one to which we redirect) is what matters for your objection.

In any event, I really don't care that much. I think that if we remove this feature, we will get a slew of different and in some cases incompatible patterns, and I would rather not see that, but maybe we won't.

ajs6f commented 7 years ago

To be clear, I think that the fact that we have two different URIs here means that from the HTTP POV, we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

birkland commented 7 years ago

we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

Exploring this perspective a little bit, would it be appropriate for the spec to say that the repository MUST return a 303 response to all requests to the resource? I think that can help address my concern with reconciling with the LDP spec, and highlight the difference between the two.

Still, the issue remains that clients that transparently follow redirects are likely to be confused, (as in the recent UI issue), at least if they're looking to ascertain the nature of the resource, discover its describedby description, etc. via inspecting headers. If external content redirects remain in the spec, I think there ought to be at least some non-normative language that explains this.

ajs6f commented 7 years ago

MUST is fine by me-- isn't that what it says now? @barmintor moved that behavior under GET and somewhere else, can't remember where.

"clients that transparently follow redirects" are intentionally hiding the nature of the web they are traversing. That's something they can do, but it's not unreasonable that in that case, they're going to miss some details. Like description. 303 is not 200. I don't think we should conflate them.

birkland commented 7 years ago

@ajs6f It says 3xx now; I'm wondering if it should specifically say 303.

As far as transparently following redirects, I just think the semantics and behavior spelled out in the spec related to external content are unusual enough that it deserves some prose.

ajs6f commented 7 years ago

@birkland No prob with extra non-normative language to clarify things. Why 303 in particular? That doesn't square for me. What about impls that might want to fully proxy POST (for authZ)?

awoods commented 7 years ago

@birkland : The current implementation responds to GET requests with 307. Do you feel strongly that 303 is more suitable?

And I agree with adding "extra non-normative language to clarify things".

Whichever status code is most appropriate on GET (303 or 307), further specifying it makes sense. @ajs6f : do you have concerns on GET or just POST?

ajs6f commented 7 years ago

It appears to me to be overspecified behavior. We don't have to have a reason to leave it as 3xx; we need a reason to be any more specific than that.

birkland commented 7 years ago

@ajs6f @awoods I think a 303 is the only response response makes sense If the following is true:

we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

So if a client does a GET (or POST, or whatever) on a one of these external-body repository URIs, the repository sends them elsewhere:

303 response to a GET request indicates that the origin server does not have a representation of the target resource that can be transferred by the server over HTTP. However, the Location field value refers to a resource that is descriptive of the target resource, such that making a retrieval request on that other resource might result in a representation that is useful to recipients without implying that it represents the original target resource

This seems to describe what this redirect is trying to achieve, right? What's being redirected to is not an LDPR managed by the repository, it's something else. The client needs to interact separately with that resource in order to discover what it is, what can be done with it, etc.

A 307 would imply:

target resource resides temporarily under a different URI and the user agent MUST NOT change the request method if it performs an automatic redirection to that URI

That seems strictly contrary to this external-content-as-distinct-from-the-ldpr scenario, right?

birkland commented 7 years ago

Looking back at the spec, it only says that a 3xx must be returned by a GET. Does that mean that all other interactions (e.g. PUT, POST, HEAD) would not result in an a redirect at all, and would instead interact with the repository LDPR (e.g. changing the external URI, deleting the LDPR, etc)?

awoods commented 7 years ago

@birkland , that is my understanding.

barmintor commented 7 years ago

@birkland I would still expect HEAD to return the same status as GET.

As to the original issue: I'm lukewarm on it, to be honest. There are persistent requests for external content references, but this is not that. If we don't have this, but instead a link, I'd argue that we no longer have a LDP-NR at all, just a LDP-RS that has an additional rdf:type (PCDM folks antennae go up). That might be fine.

awoods commented 7 years ago

@barmintor , I share a similar sentiment in that my primary concern is supporting the requirement for external content. I would be interested to hear why you do not believe the specified approach is not that.

I believe the use case we are trying to address with the message/external-body Content-Type is the ability to create an LDP-NR in the repository, the body of which is a resource found outside of the repository. Additionally, requests for that LDP-NR are expected to redirect to the external content.

The current approach does this in a standardized way with (RFC3230); but if there is another suggestion that satisfies the requirement, let's discuss.

barmintor commented 7 years ago

@awoods I'm happy to discuss in another issue?

awoods commented 7 years ago

@barmintor , ok. Does that imply that we close this one as: "leave as-is for now"?

birkland commented 7 years ago

@awoods I don't think the spec is acceptable as-is, but if the intent of the new issue is to fix it, that would be fine.

birkland commented 7 years ago

@barmintor @awoods I don't believe the spec currently spells out what happens on HEAD requests to resources with external bodies, so as-written it would suggest it's up to the implementation to do something reasonable.

ajs6f commented 7 years ago

@birkland I think there is an obvious choice (return a 307). I would be happy to see that written down.

awoods commented 7 years ago

@birkland : To close this issue, may I suggest:

  1. We decide on whether to specify 303? 307? or 3xx?
  2. We add text that "spells out what happens on HEAD", i.e. also redirect with 3?? (as decided above)
  3. We add non-normative clarifying text per: https://github.com/fcrepo/fcrepo-specification/issues/37#issuecomment-277079418
birkland commented 7 years ago

@awoods Sure, though I think feedback from @barmintor and @acoburn would suggest that use cases around external content may be addressable just by creating an appropriate LDP-RS that link to the content, rather than creating a redirecting LDP-NR. So to your list a step 0?

  1. Decide if creating LDP-NRs that redirect via external-body shall remain in the spec

.. if the answer is 'yes', then we go on to 1, 2, and 3 to clarify the existing spec?

barmintor commented 7 years ago

@birkland :+1: the subsequent list might be worth pursuing, just not the question in this issue. I would also note that there is nothing in RFC2017 that mandates a redirect, as far as I can tell.

ajs6f commented 7 years ago

Part of the problem here is that we here lots of "use cases" to the effect of "I want to put stuff outside the repo but still be able to tell people that I use Fedora."

What we need are use cases that lay out explicit patterns of interaction. I don't think we have many of those. I would like to encourage people to share common patterns for interactions of this ilk, but I don't even know if that is possible because I don't know what people are trying to do when they ask to "put stuff outside the repo". Do they want "projection"? Linking? Redirects? A big rubber band that goes from one server to the other?

awoods commented 7 years ago

@ajs6f : I agree around the ambiguity of what people actually want. I will also note that based on our understanding of what they want, we have provided the external-body mechanism. They are using it and it seems to fit the bill.

barmintor commented 7 years ago

@awoods it only fits the bill if the client can resolve the URL, right? This is actually a major roadblock to migration at CU.

birkland commented 7 years ago

Shall I go ahead and create new issues for updating/clarifying the existing spec (using Andrew's list of three), and also for the more existential "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST", then close this issue?

barmintor commented 7 years ago

@birkland I think the "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST" is this ticket, right?

awoods commented 7 years ago

@barmintor : can you describe the "major roadblock to migration at CU" comment?" ..that is starting to sound like a real use case.

birkland commented 7 years ago

New issue #40 created to fix existing spec.

Scoping this issue to "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST", my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate.

awoods commented 7 years ago

@acoburn : that is a fair point. Currently on the table are both ends of the spectrum:

In the new issue #40 (thanks @birkland !) we can collect the requirements, including the ones you just noted to help drive towards a more comfortable resolution.

barmintor commented 7 years ago

@awoods I think iana:describes must be on the LDP-RS that describes some other binary resource. If we don't redirect, we can still use external-body. Using other links really means the resource in question is not a LDP-NR.

awoods commented 7 years ago

@barmintor : thanks, I updated my comment.

ajs6f commented 7 years ago

Using other links really means the resource in question is not a LDP-NR.

@barmintor, this confuses me: I see that it would mean that we can't infer from LDP alone that it is an LDP-NR, but how can we infer that it definitely is not?

barmintor commented 7 years ago

@ajs6f if we have an existing HTTP mechanism for documenting a reference in Content-Type: external-body, and another one for redirecting in 3xx with Location: ..., and we're not a RDF source, and we're not binary content, I don't know what we're talking about anymore.

ajs6f commented 7 years ago

@barmintor That's not my point. My question is: if I have an RDF source (in an LDP server) and it has a link (of some non-iana:describes variety) to something elsewhere in the world, what on earth lets you infer that that something else isn't an LDP-NR? It doesn't seem to me that you know anything about that other resource, on an LDP basis. It could be an LDP-NR or a fried egg, for all we know.

barmintor commented 7 years ago

@ajs6f you have me backwards. I'm saying that if a resource itself doesn't have those things, that resource is not a LDP-NR.

ajs6f commented 7 years ago

You're saying that if a resource doesn't something that describes it it cannot be an LDP-NR?

barmintor commented 7 years ago

No, I'm saying if GET X is not local binary content, and GET X doesn't have Content-Type: external-body, and GET X doesn't have Location: ..., but GET X has linked data that points to other resources, X is not a LDP-NR.

ajs6f commented 7 years ago

Okay, this side issue between @barmintor and myself has been discussed in IRC, and it turns out to be a disagreement about what kinds of resources can use what kinds of links to set up a relationship of description/described, which is not at all an important part of this ticket. Sorry for sidetracking things!

ruebot commented 7 years ago

Reading through this issue, and trying to figure out if this one should be closed since it is superseded by https://github.com/fcrepo/fcrepo-specification/issues/40 and https://github.com/fcrepo/fcrepo-specification/issues/41. If that's the case, shall we close this issue?

birkland commented 7 years ago

@ruebot Looking through the comments, it looks like there isn't consensus on keeping MUST in the language:

Quoth @acoburn

it seems to me that this feature fits rather uncomfortably with the other components of the specification.

Quoth @barmintor

If we don't redirect, we can still use external-body (I think that's suggesting that returning the originally supplied message/external-body rather than a redirect would be a reasonable thing to do)

Quoth me

my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate

So how about this: Change to language MAY or SHOULD (then close this ticket), clarify the specification with #40 , and explain this all in non-normative text in #41 ?