Closed emetsger closed 7 years ago
The "external datastream" functionality is something that many Fedora 4 installations depend on. I see the specification of this interaction playing the beneficial role of defining a common approach for accomplishing redirection to external content. It would be interesting to discuss the possibility of downgrading this requirement from a MUST to a MAY ...but since this represents a fundamental use of Fedora, my inclination is to keep the wording as-is.
"external datastream" is the wrong term. It is confusing the issue. An external datastream, in Fedora 3, when retrieved, streamed through the repo allowing authZ to be employed (or disseminators to bind). There is nothing like this in F4 (the "federation" functionality was as close a match as there was).
We're talking about https://wiki.duraspace.org/display/FEDORA4x/External+Content-- if that resembles anything in Fedora 3, it would be Redirect Datastreams. Does that make the purpose more clear? No bits flow through the repo itself on retrieval of such a bitstream. It supports the extremely common use case of content outside the repo, the location of which is managed by the repo.
If we do that, people will complain that they have to follow an extra hop. The question arises: why is no one complaining about the issue you raise? Perhaps it's not hard enough to managed redirect behavior in a client to annoy people? Don't know.
That's only our own code. That doesn't mean anything. If we held ourselves to the standards to which we hold others, we'd never get anywhere.
I don't have really strong opinions about this. I leave it to @awoods to comment on how serious an annoyance the lack of a facility like this would be perceived as by users.
@acoburn Ha, I was just about to link to that issue. I'm beginning to question how well how such redirects interoperate with the LDP spec at all. For example, the resource at the other end of the redirect may or may not be an LDPR, which section 4.4.1.1 says an LDP-NR MUST be. Could that confuse LDP clients?
@acoburn yes, that was more what I was thinking: is there a "LDP-way" of doing things?
@ajs6f @awoods thanks for the clarification!
It seems like there's at least two different patterns for obtaining externally-hosted LDPNRs, and it just seemed odd to me that the spec used MUST in this section.
I think it's more fine-grained that that. There's managing the URI, there's managing the dereferencing, and there's managing the bits. We're trying to let people decide to give some but not all of those things to the repo.
@birkland We can make that more clear, if we want to keep this. We can say that the external-body thing doesn't have to be an LDPR at all, if we want to. It's not governed by LDP, it's our extension.
@ajs6f I think the spec would need to do something like that. The worrisome part to me is that the LDP spec defines the notion of containment as fundamentally a relationship between LDPRs:
Containment The relationship binding an LDPC to LDPRs whose lifecycle it controls and is aware of.
So if these external body resources are not LDPRs, then as-written the LDP spec would suggest that containment is not an appropriate concept to apply to them (i.e. a container can't contain them via ldp:contains
). We'd need to relax that part, I think?
I think the relationship is between resources, but it is accessed via URIs (and because this is HTTP, only via URIs). So the fact that we are dealing with two different URIs (the one requested and the one to which we redirect) is what matters for your objection.
In any event, I really don't care that much. I think that if we remove this feature, we will get a slew of different and in some cases incompatible patterns, and I would rather not see that, but maybe we won't.
To be clear, I think that the fact that we have two different URIs here means that from the HTTP POV, we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.
we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.
Exploring this perspective a little bit, would it be appropriate for the spec to say that the repository MUST return a 303 response to all requests to the resource? I think that can help address my concern with reconciling with the LDP spec, and highlight the difference between the two.
Still, the issue remains that clients that transparently follow redirects are likely to be confused, (as in the recent UI issue), at least if they're looking to ascertain the nature of the resource, discover its describedby
description, etc. via inspecting headers. If external content redirects remain in the spec, I think there ought to be at least some non-normative language that explains this.
MUST
is fine by me-- isn't that what it says now? @barmintor moved that behavior under GET
and somewhere else, can't remember where.
"clients that transparently follow redirects" are intentionally hiding the nature of the web they are traversing. That's something they can do, but it's not unreasonable that in that case, they're going to miss some details. Like description. 303 is not 200. I don't think we should conflate them.
@ajs6f It says 3xx now; I'm wondering if it should specifically say 303.
As far as transparently following redirects, I just think the semantics and behavior spelled out in the spec related to external content are unusual enough that it deserves some prose.
@birkland No prob with extra non-normative language to clarify things. Why 303 in particular? That doesn't square for me. What about impls that might want to fully proxy POST
(for authZ)?
@birkland : The current implementation responds to GET
requests with 307. Do you feel strongly that 303 is more suitable?
And I agree with adding "extra non-normative language to clarify things".
Whichever status code is most appropriate on GET
(303 or 307), further specifying it makes sense.
@ajs6f : do you have concerns on GET
or just POST
?
It appears to me to be overspecified behavior. We don't have to have a reason to leave it as 3xx; we need a reason to be any more specific than that.
@ajs6f @awoods I think a 303 is the only response response makes sense If the following is true:
we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.
So if a client does a GET
(or POST
, or whatever) on a one of these external-body repository URIs, the repository sends them elsewhere:
303 response to a GET request indicates that the origin server does not have a representation of the target resource that can be transferred by the server over HTTP. However, the Location field value refers to a resource that is descriptive of the target resource, such that making a retrieval request on that other resource might result in a representation that is useful to recipients without implying that it represents the original target resource
This seems to describe what this redirect is trying to achieve, right? What's being redirected to is not an LDPR managed by the repository, it's something else. The client needs to interact separately with that resource in order to discover what it is, what can be done with it, etc.
A 307
would imply:
target resource resides temporarily under a different URI and the user agent MUST NOT change the request method if it performs an automatic redirection to that URI
That seems strictly contrary to this external-content-as-distinct-from-the-ldpr scenario, right?
Looking back at the spec, it only says that a 3xx must be returned by a GET. Does that mean that all other interactions (e.g. PUT, POST, HEAD) would not result in an a redirect at all, and would instead interact with the repository LDPR (e.g. changing the external URI, deleting the LDPR, etc)?
@birkland , that is my understanding.
@birkland I would still expect HEAD to return the same status as GET.
As to the original issue: I'm lukewarm on it, to be honest. There are persistent requests for external content references, but this is not that. If we don't have this, but instead a link, I'd argue that we no longer have a LDP-NR at all, just a LDP-RS that has an additional rdf:type (PCDM folks antennae go up). That might be fine.
@barmintor , I share a similar sentiment in that my primary concern is supporting the requirement for external content. I would be interested to hear why you do not believe the specified approach is not that.
I believe the use case we are trying to address with the message/external-body
Content-Type is the ability to create an LDP-NR in the repository, the body of which is a resource found outside of the repository. Additionally, requests for that LDP-NR are expected to redirect to the external content.
The current approach does this in a standardized way with (RFC3230); but if there is another suggestion that satisfies the requirement, let's discuss.
@awoods I'm happy to discuss in another issue?
@barmintor , ok. Does that imply that we close this one as: "leave as-is for now"?
@awoods I don't think the spec is acceptable as-is, but if the intent of the new issue is to fix it, that would be fine.
@barmintor @awoods I don't believe the spec currently spells out what happens on HEAD requests to resources with external bodies, so as-written it would suggest it's up to the implementation to do something reasonable.
@birkland I think there is an obvious choice (return a 307). I would be happy to see that written down.
@birkland : To close this issue, may I suggest:
@awoods Sure, though I think feedback from @barmintor and @acoburn would suggest that use cases around external content may be addressable just by creating an appropriate LDP-RS that link to the content, rather than creating a redirecting LDP-NR. So to your list a step 0?
external-body
shall remain in the spec.. if the answer is 'yes', then we go on to 1, 2, and 3 to clarify the existing spec?
@birkland :+1: the subsequent list might be worth pursuing, just not the question in this issue. I would also note that there is nothing in RFC2017 that mandates a redirect, as far as I can tell.
Part of the problem here is that we here lots of "use cases" to the effect of "I want to put stuff outside the repo but still be able to tell people that I use Fedora."
What we need are use cases that lay out explicit patterns of interaction. I don't think we have many of those. I would like to encourage people to share common patterns for interactions of this ilk, but I don't even know if that is possible because I don't know what people are trying to do when they ask to "put stuff outside the repo". Do they want "projection"? Linking? Redirects? A big rubber band that goes from one server to the other?
@ajs6f : I agree around the ambiguity of what people actually want. I will also note that based on our understanding of what they want, we have provided the external-body
mechanism.
They are using it and it seems to fit the bill.
@awoods it only fits the bill if the client can resolve the URL, right? This is actually a major roadblock to migration at CU.
Shall I go ahead and create new issues for updating/clarifying the existing spec (using Andrew's list of three), and also for the more existential "Decide if creating LDP-NRs that redirect via external-body
shall remain in the spec as a MUST", then close this issue?
@birkland I think the "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST" is this ticket, right?
@barmintor : can you describe the "major roadblock to migration at CU" comment?" ..that is starting to sound like a real use case.
New issue #40 created to fix existing spec.
Scoping this issue to "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST", my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate.
@acoburn : that is a fair point. Currently on the table are both ends of the spectrum:
external-body
with redirectsIn the new issue #40 (thanks @birkland !) we can collect the requirements, including the ones you just noted to help drive towards a more comfortable resolution.
@awoods I think iana:describes
must be on the LDP-RS that describes some other binary resource. If we don't redirect, we can still use external-body. Using other links really means the resource in question is not a LDP-NR.
@barmintor : thanks, I updated my comment.
Using other links really means the resource in question is not a LDP-NR.
@barmintor, this confuses me: I see that it would mean that we can't infer from LDP alone that it is an LDP-NR, but how can we infer that it definitely is not?
@ajs6f if we have an existing HTTP mechanism for documenting a reference in Content-Type: external-body
, and another one for redirecting in 3xx with Location: ...
, and we're not a RDF source, and we're not binary content, I don't know what we're talking about anymore.
@barmintor That's not my point. My question is: if I have an RDF source (in an LDP server) and it has a link (of some non-iana:describes
variety) to something elsewhere in the world, what on earth lets you infer that that something else isn't an LDP-NR? It doesn't seem to me that you know anything about that other resource, on an LDP basis. It could be an LDP-NR or a fried egg, for all we know.
@ajs6f you have me backwards. I'm saying that if a resource itself doesn't have those things, that resource is not a LDP-NR.
You're saying that if a resource doesn't something that describes it it cannot be an LDP-NR?
No, I'm saying if GET X is not local binary content, and GET X doesn't have Content-Type: external-body
, and GET X doesn't have Location: ...
, but GET X has linked data that points to other resources, X is not a LDP-NR.
Okay, this side issue between @barmintor and myself has been discussed in IRC, and it turns out to be a disagreement about what kinds of resources can use what kinds of links to set up a relationship of description/described, which is not at all an important part of this ticket. Sorry for sidetracking things!
Reading through this issue, and trying to figure out if this one should be closed since it is superseded by https://github.com/fcrepo/fcrepo-specification/issues/40 and https://github.com/fcrepo/fcrepo-specification/issues/41. If that's the case, shall we close this issue?
@ruebot Looking through the comments, it looks like there isn't consensus on keeping MUST
in the language:
Quoth @acoburn
it seems to me that this feature fits rather uncomfortably with the other components of the specification.
Quoth @barmintor
If we don't redirect, we can still use external-body (I think that's suggesting that returning the originally supplied
message/external-body
rather than a redirect would be a reasonable thing to do)
my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate
So how about this: Change to language MAY
or SHOULD
(then close this ticket), clarify the specification with #40 , and explain this all in non-normative text in #41 ?
In section 1.3, HTTP POST:
I'm curious as to the motivation for calling out this specific content type? I initially read it as: a Fedora implementation, upon receiving a POST with
Content-Type: message/external-body
would dereference the URL and ingest the content of that URL (as an LDP-NR?). Later I understood that the use case was to support the idea of Fedora 3 external data streams.I wonder if it would be better for this to be a MAY, instead of a MUST, or simply remove this paragraph all together? It seems odd to me that the specification requires all Fedora implementations to carry over a use case from Fedora 3. Or, I may just not fully understand the motivation.