clojars / clojars-web

A community repository for open-source Clojure libraries
https://clojars.org
Eclipse Public License 1.0
468 stars 114 forks source link

Artifacts with a plus character in their version do not resolve from Clojure tooling #862

Closed lread closed 1 year ago

lread commented 1 year ago

Background

An issue was raised on cljdoc about it failing to build docs for an artifact hosted on clojars that has a + in its version: https://github.com/cljdoc/cljdoc/issues/764

I did some digging and currently feel that this is might be a clojars issue.

Summary

The details can be found in the cljdoc issue, but to summarize:

When an artifact version contains a + Clojure tooling will:

My Guess

Clojars hosts artifacts on AWS. When talking to AWS I expect the path portion of the request gets converted to a query string. Maybe?

The plus symbols should not have to be encoded in the path portion of a URL, but this returns a 404: https://repo.clojars.org/com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom

But when we do encode the +, the artifact is found: https://repo.clojars.org/com/github/strojure/undertow/1.4.0%2B123/undertow-1.4.0%2B123.pom

Should this be fixed?

Well, there are, at the time of this writing, only 63 artifacts with + symbols on clojars. So you could argue, yeah, just don't use a + symbol.

On Maven Central I found that just under 1% of the artifacts used a + symbol. That's more than I expected.

Are other characters problematic?

Maybe, don't know. We should probably think about that.

Next Steps

If you think this is worth addressing, I can lend a hand, take a shot at a PR, etc.

tobias commented 1 year ago

Hi @lread, thanks for the excellent bug report!

When talking to AWS I expect the path portion of the request gets converted to a query string. Maybe?

This guess was correct. Our Fastly CDN proxies to S3 using the http interface for public buckets. That http interface expects + to be encoded, even though the actual object key doesn't have an encoded +.

So in the case of the sample artifact here, the s3 object key is: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom, but the http url for it is: https://clojars-repo-production.s3.us-east-2.amazonaws.com/com/github/strojure/undertow/1.4.0%2B123/undertow-1.4.0%2B123.pom.

I wasn't able to find any official docs on this behavior, but did find a question about it on StackOverflow that provides some context.

I updated our Fastly VCL to rewrite these requests, and they should now work. Our VCL isn't yet in the infra repo, it has to be managed via the Fastly console, so I can't link to the change. But the snippet used is:

if( req.request ~ "GET|HEAD" ) {
  # Rewrite pluses to match s3's expected encoding
  # This has to be a snippet since the Fastly UI doesn't support long strings, 
  # and regular strings are urldecoded, turning the escape back into a plus

  set req.url = regsuball(req.url, "\+", {"%2B"});
}

Are other characters problematic?

That's a good question. It's unclear to me what other characters they expect to be encoded, but it would be straightforward to address them in the snippet if we discover more.

lread commented 1 year ago

Woot!:

❯ clojure -Sdeps '{:deps {com.github.strojure/undertow {:mvn/version "1.4.0+123"}}}' -Stree
Downloading: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.pom from clojars
Downloading: com/github/strojure/undertow/1.4.0+123/undertow-1.4.0+123.jar from clojars
org.clojure/clojure 1.11.1
  . org.clojure/spec.alpha 0.3.218
  . org.clojure/core.specs.alpha 0.2.62
com.github.strojure/undertow 1.4.0+123
  . io.undertow/undertow-core 2.3.5.Final
    . org.jboss.logging/jboss-logging 3.4.3.Final
    . org.jboss.xnio/xnio-api 3.8.8.Final
      . org.wildfly.common/wildfly-common 1.5.4.Final
      . org.wildfly.client/wildfly-client-config 1.0.1.Final
        X org.jboss.logging/jboss-logging 3.3.1.Final :older-version
    . org.jboss.xnio/xnio-nio 3.8.8.Final
      . org.jboss.xnio/xnio-api 3.8.8.Final
        X org.jboss.threads/jboss-threads 2.3.6.Final :older-version
    . org.jboss.threads/jboss-threads 3.5.0.Final
      X org.jboss.logging/jboss-logging 3.4.1.Final :older-version
  . com.github.strojure/web-security 1.2.0-38

Thanks so much for taking the time and putting in the effort to work around this odd S3 quirkiness @tobias!

tobias commented 1 year ago

My pleasure @lread!