abdolence / sbt-gcs-resolver

SBT plugin for Google Cloud Storage (GCS) and Google Artifact Registry with Coursier support
Apache License 2.0
28 stars 7 forks source link

The same artifact is downloaded 6 times #66

Closed peterrosell closed 4 months ago

peterrosell commented 4 months ago

When building with sbt and using coursier and using this plugin to fetch artifacts from Google Artifact Registry I have noticed that it downloads the same artifact/file 6 times.

This is an example from the sbt logs for the artifact scala-compiler-2.13.3.jar and I get the same for the sha1 file.

[info] Checking artifact at url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
[info] Receiving artifact from url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
[info] Receiving artifact from url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
[info] Receiving artifact from url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
[info] Receiving artifact from url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
[info] Receiving artifact from url: artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar.
artifactregistry://europe-maven.pkg.dev/my-project/my-repo/org/scala-lang/scala-compiler/2.13.3/scala-compiler-2.13.3.jar
  100.0% [##########] 10.7 MiB (2.3 MiB / s)

I have confirmed this in the back-end by adding a custom remote repository to a virtual repository and checked the logs in the upstream nginx ingress.

When looking in the code I see that creates a new request for every call to getInputStream. We don't know how sbt/coursier is calling the getInputStream method.

When looking at the HttpURLConnection and how it handles the getInputStream calls we see that it opens the input stream once, saves it and returns the same InputStream instance for the following calls.

I'm also not sure that it's necessary to do the HEAD request is in the connect method.

I will create a PR with suggested change to reduce the number of requests to speed up builds and lower the load of the repositories.

abdolence commented 4 months ago

Interesting finding. It is probably a good idea to cache InputStream object if people just calling it to read reference. I'll check it.

abdolence commented 4 months ago

I think this should help: https://github.com/abdolence/sbt-gcs-resolver/pull/67

peterrosell commented 4 months ago

I created a PR as well as I been testing it during the day to find out a solution. I can try your version and see if it works as expected. Then I can close my PR.

abdolence commented 4 months ago

Oh, I just saw you have a PR as well. I think we're doing something similar, sorry for confusion. Thanks for reporting and testing! I tested it on my env and it worked. I've been preparing the release with updated deps etc, so let me know if it doesn't work for you until I release it please.

abdolence commented 4 months ago

This should be fixed now in https://github.com/abdolence/sbt-gcs-resolver/releases/tag/v1.10.0

peterrosell commented 4 months ago

Lovely! I tested it and works fine. Much appreciated! :heart:

I have some more questions about logging and the head request, but i opened two separate issues, #70 and #71 .