for-GET / http-decision-diagram

An activity diagram to describe the resolution of HTTP response status codes, given various headers.
https://github.com/for-GET/http-decision-diagram/blob/master/doc/2013-06-10-http-hell-no.md
Apache License 2.0
3.61k stars 196 forks source link

How to deal with 404? #47

Closed mstade closed 7 years ago

mstade commented 7 years ago

Am I right in understanding that 404 should be dealt with before entering the system block? So in effect, there's a decision to be made before entering the decision state machine, of whether or not to enter it in the first place?

mstade commented 7 years ago

Actually, is this related to #11? I.e. the first decision point is_service_available is arguably a good point to say no, but instead of responding with 503 we choose to respond with a 404, because the service isn't available simply because it can't be found.

Or is this an entirely different decision graph altogether? (Short, as it may be.)

christopheradams commented 7 years ago

A 404 can be determined later, after the missing:true decision. is_service_available might depend on whether some part of the system (like a database) is unreachable or producing errors.

andreineculau commented 7 years ago

@mstade what @christopheradams said. 404 is only in K01 http://for-get.github.io/http-decision-diagram/httpdd.fsm.html and "not found" and "service not available" are two different things. Similarly "dealing with 404" is not smth for the server to do. It's the client that deals with response status codes. The server needs to find the appropriate one, given a context and decision blocks.

That you want to alter status codes, for reasons like #11 is a different matter.

mstade commented 7 years ago

You misunderstand me I'm afraid, I'm not talking about "dealing with" as in how to I react to a 404 response, but rather at what point does the server decide to send a 404 because it can't find the resource being requested – is it supposed to be in this state machine at all or is it a response that comes before this? I guess the underlying question to this is, is this diagram a representation of a single resource state machine or is it a representation of a server which may comprise many resources?

andreineculau commented 7 years ago

And I'm afraid I did not misunderstand you. I was just highlighting that your premise was wrong. The entire goal of this decision diagram is to take you away from thinking/deciding status codes. Give it enough input, and it will you what HTTP status code is appropriate.

is this diagram a representation of a single resource state machine or is it a representation of a server which may comprise many resources?

neither. It's a flow for *building the response of a HTTP transaction", which is obviously limited to a=one resource identified by the request URL, but can span over multiple servers e.g. 503 can be spit out by a reverse proxy, and not by the origin.

FWIW "a resource state machine" is an entire beast altogether.

mstade commented 7 years ago

Yes, you're quite possibly right about me misunderstanding the entire purpose of this diagram. Much appreciated if you could help me understand it better!

It's a flow for *building the response of a HTTP transaction"

Right, ok, but surely a 404 response is part of that? It seems strange to me that K01 is the only situation that would happen. For instance, how could you possibly determine whether C2 (accept_matches) is ok if you can't find the resource in the first place?

andreineculau commented 7 years ago

Am I correct to think that when you say "you can't find the resource in the first place" you mean that you didn't set up a route handler? e.g. your router can handle /items and /items/:id, but if nothing matches it will return 404. If that is so, imagine that the router is nothing more than a proxy: I take a request for URL /foo/bar/ and I proxy it to some internal function. That proxy can run the decision diagram on its own, it will skip the accept_matches, because in its own boundaries it could in theory reply with whatever the internal function returns. If the request if for /foo/qux, then the resource is missing, because there's no routing for it. The internal function then runs its own decision diagram, etc.

As for the question itself "how could you possibly determine C2", if you cannot determine that without doing some lookup,

  1. there is nothing wrong at all with you doing the lookup, finding nothing, and spitting out 404, before 406. This diagram says nothing as such
  2. there is nothing wrong at all with you doing the lookup, finding nothing, returning false (accept doesn't match, because the lookup itself failed, thus I don't actually know what I will be serving) and thus ending up with a 406.

I hope there is no reference anywhere that says that order of HTTP statuses matters in this diagram. If it does, please report it.

PS: There are at least a few concepts that you're overlapping and I'm not sure how I can clarify them as via github issues. Based on that observation, I'd appreciate it if you give some context as to what you're doing, what you want to achieve and where you got stuck.

mstade commented 7 years ago

There are at least a few concepts that you're overlapping and I'm not sure how I can clarify them as via github issues. Based on that observation, I'd appreciate it if you give some context as to what you're doing, what you want to achieve and where you got stuck.

Yes you're right and I apologize – I'll do my best to try and clarify but please forgive me, I fear it may be more of a rant than anything. :o/

I've implemented a state machine that uses this diagram as the spec, in a sense. The input to this machine is a request, and as a result you get a response back. It is I suppose, at least superficially, not very different from something like webmachine.

Instances of this machine is then put in to what is effectively a routing table (it's a trie, really, but it doesn't matter) that maps URLs to the specific machine (i.e. resource) meant to handle requests to that URL. (Currently, there's a one-to-one mapping of URL to resource machine, although I suppose that's not necessarily a requirement – certainly makes it easier to reason about though.)

In the case you feed this router a URL that isn't mapped, it won't be able to give you a resource. So what I've done in this case is to have effectively a global "not found" resource, which technically is just like any other resource. I suppose this is where I get confused because it's not quite like any other resource – is it? Of course, there's overlap – a 503 might still occur I guess and certainly a bug might throw an exception in which case I suppose a 500 would make sense, but I wouldn't really expect a 308 for instance.

You're right, I'm conflating things – my apologies. I guess I'm just having a hard time reading this diagram and seeing how 404 fits in. Likewise – and again I apologies for conflating things – I'm not sure about 500. In the diagram it's in a few places, but couldn't it technically happen at pretty much any point? I.e. something breaks down irreparably and there's nothing meaningful to glean from it other than "shit hit the fan" so here's a 500?

I hope there is no reference anywhere that says that order of HTTP statuses matters in this diagram. If it does, please report it.

Wait, you start at B26 don't you, and then work your way through the decisions? In that case surely that's the order and it does matter, no?

andreineculau commented 7 years ago

Hej Marcus, Glad to hear you have an implementation and great if you can ever opensource it!

To your point, it's true that the proxy (your route-to-machine resource handler) acts just as another instance of a decision-diagram. The fact that this type of a resource may need only 1% of this decision-diagram and it would give the same result doesn't change anything about the diagram. All that is needed is to have the callbacks implemented correctly, and in this case of the proxy, if we talk about 308, the moved_permanently callback will simply return false all the time. By all means, all callbacks might be coded to return true/false without any real checks, just to model the intended response. Waste of CPU cycles? maybe, but it depends on your implementation how many ms that translates to.

Re:500 - yes, it can happen anywhere, so a try-catch should be in place. AFAIK, that's what any HTTP server does by default. The diagram has instances of 500 for callbacks that have no reason to fail e.g. process_delete

Re:order - obviously there's an order implied by the flow, but there is nothing that forbids me, as the diagram author, or you, as the implementor, to switch B24 with B23 for instance (and throw 414 before 503), except for common sense, and in certain cases probability/optimization. If you want to first check if the request if authorized or not, before you check that the resource can handle the request media-type, nothing stops you. That's what I mean by order of status codes does not matter.

mstade commented 7 years ago

Hej Andrei! :o)

So let me see if I get your point – this decision graph is like a general purpose graph, but there may well be a number of "known" cases where the decision graph may always be a subset, or perhaps even slightly/entirely different? I suppose the cache control graph is an example of that. It makes a lot of sense to me actually, and makes me think that the path I've been going down with a "404 resource machine" is probably the way to go, and it could possibly be a different state machine altogether (the logic that "runs" the machine is agnostic, it just cares about the FSM interface.)

My plan is certainly to open source this work, and I'd be happy to show you. In fact I hadn't realized till now that you're in Stockholm – I'll buy you a beer for the opportunity to pick your brain! ;o)