Closed lread closed 1 year ago
Hi @lread, thanks for the excellent bug report!
When talking to AWS I expect the path portion of the request gets converted to a query string. Maybe?
This guess was correct. Our Fastly CDN proxies to S3 using the http interface for public buckets. That http interface expects +
to be encoded, even though the actual object key doesn't have an encoded +
.
So in the case of the sample artifact here, the s3 object key is: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom
, but the http url for it is: https://clojars-repo-production.s3.us-east-2.amazonaws.com/com/github/strojure/undertow/1.4.0%2B123/undertow-1.4.0%2B123.pom
.
I wasn't able to find any official docs on this behavior, but did find a question about it on StackOverflow that provides some context.
I updated our Fastly VCL to rewrite these requests, and they should now work. Our VCL isn't yet in the infra repo, it has to be managed via the Fastly console, so I can't link to the change. But the snippet used is:
if( req.request ~ "GET|HEAD" ) {
# Rewrite pluses to match s3's expected encoding
# This has to be a snippet since the Fastly UI doesn't support long strings,
# and regular strings are urldecoded, turning the escape back into a plus
set req.url = regsuball(req.url, "\+", {"%2B"});
}
Are other characters problematic?
That's a good question. It's unclear to me what other characters they expect to be encoded, but it would be straightforward to address them in the snippet if we discover more.
Woot!:
❯ clojure -Sdeps '{:deps {com.github.strojure/undertow {:mvn/version "1.4.0+123"}}}' -Stree
Downloading: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom from clojars
Downloading: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.jar from clojars
org.clojure/clojure 1.11.1
. org.clojure/spec.alpha 0.3.218
. org.clojure/core.specs.alpha 0.2.62
com.github.strojure/undertow 1.4.0+123
. io.undertow/undertow-core 2.3.5.Final
. org.jboss.logging/jboss-logging 3.4.3.Final
. org.jboss.xnio/xnio-api 3.8.8.Final
. org.wildfly.common/wildfly-common 1.5.4.Final
. org.wildfly.client/wildfly-client-config 1.0.1.Final
X org.jboss.logging/jboss-logging 3.3.1.Final :older-version
. org.jboss.xnio/xnio-nio 3.8.8.Final
. org.jboss.xnio/xnio-api 3.8.8.Final
X org.jboss.threads/jboss-threads 2.3.6.Final :older-version
. org.jboss.threads/jboss-threads 3.5.0.Final
X org.jboss.logging/jboss-logging 3.4.1.Final :older-version
. com.github.strojure/web-security 1.2.0-38
Thanks so much for taking the time and putting in the effort to work around this odd S3 quirkiness @tobias!
My pleasure @lread!
Background
An issue was raised on cljdoc about it failing to build docs for an artifact hosted on clojars that has a
+
in its version: https://github.com/cljdoc/cljdoc/issues/764I did some digging and currently feel that this is might be a clojars issue.
Summary
The details can be found in the cljdoc issue, but to summarize:
When an artifact version contains a
+
Clojure tooling will:My Guess
Clojars hosts artifacts on AWS. When talking to AWS I expect the path portion of the request gets converted to a query string. Maybe?
The plus symbols should not have to be encoded in the path portion of a URL, but this returns a 404: https://repo.clojars.org/com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom
But when we do encode the +, the artifact is found: https://repo.clojars.org/com/github/strojure/undertow/1.4.0%2B123/undertow-1.4.0%2B123.pom
Should this be fixed?
Well, there are, at the time of this writing, only 63 artifacts with + symbols on clojars. So you could argue, yeah, just don't use a + symbol.
On Maven Central I found that just under 1% of the artifacts used a + symbol. That's more than I expected.
Are other characters problematic?
Maybe, don't know. We should probably think about that.
Next Steps
If you think this is worth addressing, I can lend a hand, take a shot at a PR, etc.