What to do about no-ops

caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS

https://caddyserver.com

Apache License 2.0

58.34k stars 4.04k forks source link

What to do about no-ops #5182

Closed mholt closed 6 months ago

mholt commented 2 years ago

I just wrote a new article for our wiki: https://caddy.community/t/why-caddy-emits-empty-200-ok-responses-by-default/17634

It explains why Caddy does what it does for "no-op" requests; that is, why Caddy emits 200 OK even when it wasn't configured to do anything.

This issue is here to discuss one more time whether that behavior could be improved upon. I recently heard a use case from a contact within a company exploring Caddy that the 200 behavior was surprising and made it difficult to troubleshoot whether the request was being handled partially or not at all (i.e. what routes was it taking that it ended up as a no-op?). Misconfigured routes -- maybe matchers that don't match what is expected -- or missing handlers can cause confusion.

I'd be open to discussing a non-standard 2xx status code to make it more obvious that the server is working, but lacks configuration to invoke an application or originate content. For example, 290 NOP? I dunno.

I don't love this because clients won't know what to do with it. Some clients just look at the first digit to get the gist of what happened. Others expect a specific 20x. Who knows what this would break.

For reasons stated in the wiki article above, I'm not inclined to change this behavior.

Personally, I think a better solution than changing the status code is to provide better config debugging tools.

Config assertions: given a test corpus, ensure tests pass before applying config. #4537
Request tracing in the debug-level logs: emit a log when matchers and handlers are evaluated.

The latter might not be too hard to get something simple working, so I'll push a branch later with my tinkering.

Feedback welcome in the meantime.

Prior work/discussion in:

3445 and #3446
3879
4026
3226

francislavoie commented 2 years ago

Prior work: https://github.com/caddyserver/caddy/issues/3445 I had tried to implement a debug log when requests go unhandled, but I found it non-trivial to implement correctly.

mholt commented 2 years ago

Ah yes!! Thanks for linking that. I was looking for that but had enough trouble finding the other issue too. Updated original post to link to more issues.

amadsen commented 1 year ago

I love Caddy and appreciate your work, but I really don't understand this logic:

(We often get requests to change the default to 404, but that means “Not Found” – but we don’t know that anything wasn’t found, because we weren’t looking for anything!)

This feels like you're over-thinking it. Caddy received a request for a resource. Caddy was not configured to respond to that request. We were looking for the resource requested in the request. There was no configured response. 404 is the only logical response - no configured resource was found for the request that was made.

This is very different from the situation where Caddy is configured to respond to the request (via a file or reverse proxy) and that results in an empty response. That would (pending other config) be an appropriate time to respond with a 200 and an empty response body.

The current behavior falsely indicates that Caddy was explicitly configured to respond to the request with an empty response - that is, that it successfully found an empty resource, rather than not finding any resource and sending an empty resource by default.

amadsen commented 1 year ago

Looking at it from another perspective that you discuss in the wiki post:

One is grounded plainly in HTTP spec. 200 OK [literally means “the request has succeeded.” 1](https://www.rfc-editor.org/rfc/rfc9110#name-200-ok) That is indeed the case here even if the server’s configuration didn’t have anything specific for that request. The server successfully received, decoded, parsed, and evaluated the request. It just wasn’t configured to do anything.

This fundamentally conflates the server - Caddy - with the resource (which, in this case, Caddy doesn't have enough information to find). A web server's contract is to serve resources. "404 Not Found" means the server was unable to find (or supply) the requested resource. From the perspective of the server - Caddy - 404 is not an error; it is an indication that the server successfully understood and processed the request, but did not find anything for the resource requested. Not being able to understand (501) or process (422) the request would be a different state represented by a different status. The server not being configured to do anything with a request is the literal definition of a 404.

mholt commented 1 year ago

Hey @amadsen Thanks for the comments!

Caddy received a request for a resource.

Maybe; or maybe not. Not all requests are for resources. Most non-GET methods are not for resources, for example.

This feels like you're over-thinking it. Caddy received a request for a resource. Caddy was not configured to respond to that request. We were looking for the resource requested in the request. There was no configured response. 404 is the only logical response - no configured resource was found for the request that was made.

Yes, so this whole argument makes sense from an application layer perspective. However, as a plain HTTP server, Caddy can't make assumptions about application semantics.

This fundamentally conflates the server - Caddy - with the resource (which, in this case, Caddy doesn't have enough information to find). A web server's contract is to serve resources.

Ok, I see the confusion here. Again, what you're saying makes sense from an application perspective.

Actually, a web server's contract is to connect HTTP with an application (some sort of handler that does something with a request). The file server serves files, the reverse proxy gets a response from a backend, even a simple "static_response" handler writes a hard-coded response, etc.

If there is no application configured to handle an HTTP request, the default response is 200 OK, meaning that "HTTP is working, yes" -- there's just no application value to the response, so it's empty.

I know this is different from what you're used to.

The server not being configured to do anything with a request is the literal definition of a 404.

404 means "that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists." The linked section about target resources talks about application logic -- not something that the bare-bones HTTP server can do.

200 means, "I got an HTTP request and handled it according to my configuration correctly," and an empty 200 probably means, "I was not configured to do anything, so here's nothing. I have no application logic."

Writing application logic into a vanilla HTTP server would be a mistake.

The current behavior falsely indicates that Caddy was explicitly configured to respond to the request with an empty response - that is, that it successfully found an empty resource, rather than not finding any resource and sending an empty resource by default.

Remember, it wasn't even looking for a resource -- it just did the HTTP successfully.

Think of it kind of like a 0 value. It's not null (because the server IS working), but it's not non-zero (it wasn't configured to do anything). It's just the default value.

amadsen commented 1 year ago

HTTP relies on URIs - Uniform Resource Indicators - to indicate a requested resource, whether that resource is a file or an application behavior. Methods act on those URIs. A web server's job is to connect the request with the underlying resource. It is, by definition, a middleware. The underlying resource may not even speak HTTP - such as a file system, file server, or [Fast]CGI application - and should not be relied upon to provide HTTP semantics directly; that is the job of the web server.

Remember, it wasn't even looking for a resource -- it just did the HTTP successfully.

It is impossible to have an HTTP request that doesn't refer to a resource (URI). HTTP - the protocol - may have been successfully "done", but the resource was not found (because it wasn't defined) so the request was not successful - it was not found.

The migration of web server (HTTP) semantics directly in to applications ("services") is a relatively modern phenomenon. I don't have a problem with it, but it contributes to a lot of confusion that can occur because of equivocation in the term "application". According to the OCI network model all of HTTP operates at the "application" layer. Most "web applications" have for many years been composed of multiple layers of executables - web servers, script engines, databases, "services", etc. - which may or may not be referred to as an "application" in a given context. Because Caddy is speaking HTTP - and may be the only executable that does so for a given request (as would be the case when lacking configuration for a requested URI) - it is entirely appropriate for it to provide an HTTP response code indicating that it couldn't find anything for the requested resource. Conversely - for comparison sake - haproxy in tcp mode is not speaking HTTP and therefore it would be inappropriate for it to provide an HTTP response.

I think it is useful to keep in mind that these specifications were developed in tandem with early web servers (and web browsers), with decades of opportunity to adjust both the specifications and the servers. If your interpretation of the specifications is highly discordant with the behavior of those servers is a strong signal that you might not be reading them as intended.

I strongly dislike the default of responding with a 200 response code and an empty response body. In my opinion it violates the principle of least surprise, both by my reading of the specifications and precedent. I think it should be changed to a 404. However, I highly respect the thought and work that has been put in to create Caddy (and recognize it wasn't done by me). While I hope my perspective is persuasive and useful, I respect you and encourage you and the team to implement as you see fit.

amadsen commented 1 year ago

200 means, "I got an HTTP request and handled it according to my configuration correctly," and an empty 200 probably means, "I was not configured to do anything, so here's nothing. I have no application logic."

Another way of stating my argument is the configuration is (a layer of) application logic defining the HTTP resource and a lack of configuration specifically means no resource was found.

amadsen commented 1 year ago

Also, I have run in to the situation where a load balancer or other intermediate http server unexpectedly responds with a 404 (or 500, or 200) and therefore violates client expectations (perhaps resulting in blown error budgets and/or difficult to debug situations). From that perspective, I very much appreciate Caddy's effort to make as few assumptions as possible.

mholt commented 1 year ago

The migration of web server (HTTP) semantics directly in to applications ("services") is a relatively modern phenomenon. I don't have a problem with it, but it contributes to a lot of confusion that can occur because of equivocation in the term "application".

HTTP without an application is a no-op. There's nothing to do, whether back then or today.

So, what is the default HTTP response?

(Nothing official defines one AFAIK.)

An empty response to let clients know the HTTP server is working seems most reasonable to me.

I'm sorry it's confusing, but I do think other answers here are just not as "correct".

I think the best solution will be better troubleshooting tools... like tracing, more helpful logs, etc.

Because Caddy is speaking HTTP - and may be the only executable that does so for a given request (as would be the case when lacking configuration for a requested URI) - it is entirely appropriate for it to provide an HTTP response code indicating that it couldn't find anything for the requested resource.

That assumes Caddy is one monolithic, single-purpose application, when it's actually a JSON API and CLI for HTTP (and other things too, such as TLS, which is irrelevant here).

If your interpretation of the specifications is highly discordant with the behavior of those servers is a strong signal that you might not be reading them as intended.

I think they were more near-sighted, personally. (I don't blame them, I don't think they understood the future like it actually is today.)

Who wants a general-purpose web server that doesn't do nothing by default? If we did something by default, you'd (or other people would) be just as confused and frustrated because the server is doing something it's not configured to do.

I think the fact that there is no official, clearly-defined "default" HTTP response for a "working, but unconfigured" server is a pretty good sign that the spec writers did not have the foresight of modern systems. Again, I don't blame them -- but I do still think the "0-value" server behavior is most correct.

Also, I have run in to the situation where a load balancer or other intermediate http server unexpectedly responds with a 404 (or 500, or 200) and therefore violates client expectations (perhaps resulting in blown error budgets and/or difficult to debug situations). From that perspective, I very much appreciate Caddy's effort to make as few assumptions as possible.

Ah, right -- so like what I was saying above in this reply before I saw this.

The server can do an empty response and be confusing (though I don't think it's confusing).
The server can do a non-empty response and be confusing (because it wasn't configured! "where is this coming from!!??" kind of thing)

Pick one :upside_down_face:

(I've had too many bad experiences with the second.)

mholt commented 1 year ago

I should mention, after discussing with Francis in Slack, that your arguments are compelling -- I think it just comes down to "we see the Web differently." :man_shrugging:

But they are well-reasoned, well-cited arguments that do make sense from a certain point of view.

I think there's just ambiguities between theory and practice (spec and implementation) especially over long periods of time (6+ months, heh) and that's what we're running into here.

I appreciate the content and manner of your discussion :+1:

amadsen commented 1 year ago

I agree that this is a "we see the web differently" situation and will gladly accept that as reasonable. I very much appreciate you taking the time to consider my perspective - in addition to the amazing work you're doing in general on Caddy. When I consider some of the annoying debugging experiences that I've had with intermediate http servers, I can see where you are coming from better. I still prefer 404 as a better zero-value / default (I think it is one of 404's many intended uses - where "many" is a potential problem), but agree that either situation can be confusing.

Thanks again!

lowne commented 1 year ago

There must be something I'm missing here, because after skimming this and related issues it seems to me that the discussion is happening at the wrong abstraction level.

So, what is the default HTTP response? (Nothing official defines one AFAIK.)

I don't think there can even be a "default response" because at the spec level there is no such thing as an unhandled request (defined as "I, the server, won't even look at it, but otherwise I'm working just fine"). Once you agree to speak HTTP, you must reply in some fashion to every request that follows the spec.

Which hopefully shows that the discussion is really about how to configure routes (ie in the Caddyfile, by the user) and not what caddy's maintainers should hardcode in the executable.

Now, the user should either have a catch-all respond directive, or make sure that all subroutes in the hierarchy are handled in some way, but in practice we see that's not the case, because users are stupid (me foremost) or forgetful or disorganised and because the Caddyfile is allowed to have unhandled routes.

So: either strictly forbid unhandled routes in the config (as in, caddy refuses to start) - which seems impractical for a number of reasons - or "give in" to popular demand/the reality on the ground, which means a server-wide option for what to do with unhandled routes, e.g.

servers {
  unhandled_routes {
    error 404
  }
}

(note how it'll still go through handle_errors downstream, which is inelegant, but often necessary); and then we can argue all day about what the default should be when said option is missing.

(my 2c: keeping respond 200 seems straightforward enough, but technically @mholt arguments seem to indicate that abort (as in, "your config is broken, fix it") would be better, for some definition of "better")

And in closure I must give a huge thank you to y'all for this incredible piece of software. Been using caddy since the v1 days and I'm still amazed at how it keeps getting better (there have been a couple of occasions where I said "dang, I need feature X", went to check the release notes, and lo and behold feature X was in the latest beta! Just incredible)

mholt commented 1 year ago

@lowne Thank you for this thoughtful reply -- I think it makes a lot of sense. And thanks for your nice comments about the project :blush:

I agree we can't really require the user to configure a handler for all possible routes.

Sending unhandled requests through an error chain is interesting, but the status code is up for debate, as it's unclear whether the server is misconfigured or the request was misfired. Either way, we're back to the question of what is correct.

I think at this point I do recommend that if you want a specific way of handling no-ops, that you simply enscribe that into your config: respond 404 or whatever you want/think is most correct.

It's becoming clear that this isn't a decision Caddy should make for everyone, as we all see it differently, and it's best left up to the user to decide.

mholt commented 6 months ago

After a year and a half, the discussion consensus seems to be... that there isn't one :sweat_smile:

I appreciate everyone's kindness and professionalism in discussing the matter. There are compelling arguments both ways.

For now I've decided not to change any behavior or semantics. But I did decide to slightly adjust the access log message when a request reached the emptyHandler at the end of a chain (i.e. was not handled explicitly). The message will now be "NOP" instead of "handled request" for those requests. (I took a slightly different approach than Francis did, but I learned from that closed PR so I recognize the contribution there.)

mholt commented 6 months ago

(Oops, the linked commit is only half the solution for some reason. See 399186abfce674ceccb7c9197fa11158d101f485 for the second half.)

evnix commented 4 months ago

after spending over a week of debugging what the issue was, and debugging through TLS thinking it had to be it. I finally realized my issue was similar to this. I was thinking caddy config was ok since it returned 200. but luckily I found this thread today and found that I had a misconfigured route.

just adding the following keywords incase some one comes searching like I did, "caddy blank page" "caddy SSL/TLS blank page"

caddyserver / caddy

What to do about no-ops #5182

3445 and #3446

3879

4026

3226