cognitect-labs / aws-api

AWS, data driven
Apache License 2.0
731 stars 100 forks source link

Breaking change in `c7e21d2` - HTTP 304 is not a redirect but rather a proper response code #240

Closed p-himik closed 1 year ago

p-himik commented 1 year ago

Dependencies

com.cognitect.aws/api                               {:mvn/version "0.8.673"}
com.cognitect.aws/endpoints                         {:mvn/version "1.1.12.456"}
com.cognitect.aws/s3                                {:mvn/version "847.2.1365.0"}

Description with failing test case

I'm proxying S3 objects with my own web server and I forward both etag and last-modified headers from the client. In some instances, it makes S3 return HTTP 304 which is reasonable. After this commit, HTTP 304 is treated as an anomaly and makes it impossible to use client-side cache.

dchelimsky commented 1 year ago

Thanks for the report. We'll have a fix out this week.

dchelimsky commented 1 year ago

Would you please post the response you're getting in this case, along with (meta response)?

p-himik commented 1 year ago

Sure, although I can easily do that only on my development setup where I use MinIO (with some private fields elided). But the behavior can be seen on proper AWS S3 as well.

The response:

{:cognitect.anomalies/category :cognitect.anomalies/incorrect}

Its meta:

{:http-request  {:request-method :get
                 :scheme         :http
                 :server-port    9000
                 :uri            [...]
                 :headers        {"x-amz-date"           "20230531T101402Z"
                                  "if-modified-since"    "Tue, 09 May 2023 10:28:58 GMT"
                                  "if-none-match"        "\"2db6f39c4ad8bfe09eb1306ecacb64b2\""
                                  "host"                 "localhost"
                                  "x-amz-content-sha256" [...]
                                  "authorization"        [...]}
                 :body           nil
                 :server-name    "localhost"}
 :http-response {:status  304
                 :headers {"server"                    "MinIO"
                           "x-content-type-options"    "nosniff"
                           "strict-transport-security" "max-age=31536000; includeSubDomains"
                           "accept-ranges"             "bytes"
                           "etag"                      "\"2db6f39c4ad8bfe09eb1306ecacb64b2\""
                           "x-amz-request-id"          "176432D2F6B6204B"
                           "date"                      "Wed, 31 May 2023 10:14:02 GMT"
                           "vary"                      "Origin"
                           "last-modified"             "Tue, 09 May 2023 10:28:58 GMT"
                           "x-xss-protection"          "1; mode=block"
                           "content-security-policy"   "block-all-mixed-content"}
                 :body    nil}}
dchelimsky commented 1 year ago

And, just so I'm clear, the response you got before was whatever the content was and it just came with a 304 instead of a 2xx?

p-himik commented 1 year ago
  1. Client issues a request without any cache-related headers
  2. Client receives a response with a body, status 200, and cache-related headers
  3. Client issues a request for the same resource but now with cache-related headers
  4. The S3 server sees those headers and figures that the resource is not changed - it answers with status 304 and no body
dchelimsky commented 1 year ago

Got it! Very helpful. Thank you.

dchelimsky commented 1 year ago

I'd not used these headers myself so I didn't know how to. Now I do ;)

For anybody else reading, the request map for e.g. :GetObject accepts these cache related key/val pairs:

 :IfNoneMatch string,
 :IfUnmodifiedSince timestamp,
 :IfModifiedSince timestamp,
 :IfMatch string,

... which are coerced to http request headers e.g. "if-none-match", etc

p-himik commented 1 year ago

Wouldn't just allowing 304 though suffice, without having to peer into the headers?

dchelimsky commented 1 year ago

Wouldn't just allowing 304 though suffice, without having to peer into the headers?

100%! My comment about the headers was to provide context for readers who come across this issue down the road, not to hint at any implementation choice. Make sense?

p-himik commented 1 year ago

Ah, yeah. :)

dchelimsky commented 1 year ago

FYI, this opened up a design discussion that is taking a bit of time. I'm not sure where it's going to land yet, but in the mean time, you can test for (= 304 (-> response meta :http-response :status)) to make whatever programmatic decision you need to make.

p-himik commented 1 year ago

Is this discussion public? Curious to see the what the argument is about. Personally, I don't see any reason not to just check for 304 in the most direct way given that HTTP status codes are pretty much set in stone.

I have simply rolled back to an older version for now, so I hopefully don't have to change the app's code at all when this is resolved.

dchelimsky commented 1 year ago

Right now it's a private discussion among maintainers, but I'm happy to try to answer your questions.

Some context: whatever changes we make have to work consistently for all operations on all AWS services, which have different rules about how they deliver error information.

The AWS Java SDK (and I assume the others, still need to validate that) returns exceptions for everything >= 300, including 304. Before the recent change to aws-api, we were using 400 as a threshold, which meant that when AWS sent us an error payload with a 3xx (as they do for a 301), we were treating that as a success and delivering unspecified information to the user. This is the primary focus of the design discussion: what should we be doing for 3xx?

In terms of checking for a 304, (= 304 (-> response meta :http-response :status)) is the most direct way for client code to check for a 304 today, and is unlikely to change (though I'm not making any promises until we're ready to make them). This is quietly documented in the Usage section of the README, which shows that the http-response is in the metadata.

How is your code checking for the 304?

dchelimsky commented 1 year ago

This is quietly documented

We're going to make that less quiet once we define what we're doing here.

p-himik commented 1 year ago

Ah, makes sense in the context of all AWS services, thanks. I could be quite myopic here because I only use S3 currently.

(= 304 (-> response meta :http-response :status)) is the most direct way for client code to check for a 304 today

Right, I meant checking that in the library. To exclude the 304 status code from the ">= 300" range and treat it as a success. I can see how a redirect might not be considered a success, but I can't stretch that logic in my head enough to also cover the 304 status. After all, for the response to be 304, the request has to be made deliberately with the right headers, so the lib's client should expect that and in principle in that case 304 is not an anomaly but rather an expected behavior, unless, perhaps, if it's received in the absence of all the required headers.

How is your code checking for the 304?

That's the thing - it doesn't. As I mentioned, all my app's code is doing here is basically proxying S3 resources - it signs the URL for a requested resource, makes a request to AWS, and forwards the response to the client in full and mostly preserved. The only reason I'm checking for anomalies at all is to log truly anomalous things that shouldn't happen.

dchelimsky commented 1 year ago

Hey @p-himik. We've been studying this all this time, and we've decided that we're going to keep the new behavior, treating 304 as an error.

The rationale (also documented now in the README), is that AWS documents 304 as an error, and all of the AWS SDKs actually throw exceptions in this case.

We've also documented this as a breaking change in the CHANGELOG.

I understand and agree with your rationale for not treating 304 as an error. So does everybody else involved in the discussion. In the end, however, we all agreed that our guiding principle needs to be alignment with AWS semantics, regardless of whether AWS's choices align with expectations based on HTTP semantics.

I'm going to close this issue, but please feel free to keep commenting here if you wish, or we can discuss it further in the #aws channel in Clojurians Slack.

dchelimsky commented 1 year ago

There is one other thing we added: :cognitect.aws.http/status is now part of the body of all anomalies, so in the 304 case you'll get (at least):

{:cognitect.anomalies/category :cognitect.anomalies/conflict
 :cognitect.aws.http/status    304}

So now application code can look for either the anomaly category or the status to make branching decisions.