dakrone / clj-http

An idiomatic clojure http client wrapping the apache client. Officially supported version.
http://clojars.org/clj-http
MIT License
1.78k stars 408 forks source link

redirect location is being decoded before being followed #582

Closed torkus closed 3 years ago

torkus commented 3 years ago

I have this (roughly):

(def default-strategy org.apache.http.impl.client.DefaultRedirectStrategy/INSTANCE)

(def logging-redirect-strategy
  (reify org.apache.http.client.RedirectStrategy
    (getRedirect [this request response context]
      (println "attempting redirect")
      (println "req" request)
      (println "resp" response)
      (.getRedirect default-strategy request response context))
    (isRedirected [this request response context]
      (println "checking isRedirected")
      (println "req" request)
      (println "resp" response)
      (.isRedirected default-strategy request response context))))

(def url "https://edge.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip")
(client/get url {:redirect-strategy logging-redirect-strategy})

with this output:

checking isRedirected
req GET https://edge.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip HTTP/1.1
resp HttpResponseProxy{HTTP/1.1 302 Found [Date: Wed, 27 Jan 2021 01:47:43 GMT, Content-Length: 0, Connection: close, Location: https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15%2BClassic.zip] [Content-Length: 0,Chunked: false]}

attempting redirect...
req GET https://edge.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip HTTP/1.1
resp HttpResponseProxy{HTTP/1.1 302 Found [Date: Wed, 27 Jan 2021 01:47:43 GMT, Content-Length: 0, Connection: close, Location: https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15%2BClassic.zip] [Content-Length: 0,Chunked: false]}

checking isRedirected
req GET https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip HTTP/1.1
resp HttpResponseProxy{HTTP/1.1 404 Not Found [Content-Type: application/xml, Transfer-Encoding: chunked, Connection: close, Date: Wed, 27 Jan 2021 01:47:44 GMT, Server: AmazonS3, X-Cache: Error from cloudfront, Via: 1.1 9c269b27f2f2f1cf998e691405f9c020.cloudfront.net (CloudFront), X-Amz-Cf-Pop: MEL50-C1, X-Amz-Cf-Id: 0gXNnADgXJEuJ7yyrcFue87TzFcymtMWjVrQpjtM199QriAbb6TK4w==] ResponseEntityProxy{[Content-Type: application/xml,Chunked: true]}}

and from the above we can see that this:

https://edge.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip

is redirected to:

https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15%2BClassic.zip

which exists, but instead this is requested:

https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip

which does not exist, resulting in a 404.

Is this a bug somewhere?

Using clj-http 3.11.0, java 11, clojure 1.10.1

rymndhng commented 3 years ago

It appears that there's an option in the underlying Apache HTTPClient 4.5.x that normalizes the URIs. See https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultRedirectStrategy.java#L162-L164

A quick test shows this behavior:

user=> (org.apache.http.client.utils.URIUtils/normalizeSyntax (java.net.URI. "https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15%2BClassic.zip"))
#object[java.net.URI 0x43631a38 "https://media.forgecdn.net/files/3135/377/MobInfo2-8.3.15+Classic.zip"]

I think you should be able to get around this by passing in a custom http-request-config with the specific option disabled. See https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/config/RequestConfig.Builder.html#setNormalizeUri(boolean)

torkus commented 3 years ago

Did a bit of a copy+paste of the RequestConfig builder function in clj-http/core with setNormalizeUri to ~true~ false and it appears to works fine.

I've opened a PR.

torkus commented 3 years ago

Oh, and thank you for your reply, it was very helpful.

torkus commented 3 years ago

Any idea when/if my change will make it into clj-http?

rymndhng commented 3 years ago

@torkus sorry for the delay, 3.12.0 is out now 🎉

torkus commented 3 years ago

all good, thank you for your help, I appreciate it.

torkus commented 3 years ago

um ... this is embarrassing, but I think I made a simple logic mistake.

I had:

(.setNormalizeUri (or normalize-uri true))

and

(or false true) => true

I'll open a fix and include a test this time.

torkus commented 3 years ago

Fixed version here: https://github.com/dakrone/clj-http/pull/584

(no rush ;)