Open AustinSchuh opened 5 years ago
What would be the design for this?
What about bazel build --repository_mirror=https://mirror.bazel.build
and that would automatically rewrite URLs like Austin suggested: https://example.com/download.zip
to https://mirror.bazel.build/example.com/download.zip
(basically just prefix them).
Open questions:
If it's just about hosting publicly available files somewhere closer to home then I think the solution for this problem is to implement using the remote cache as a repository cache. Both are content addressable and running a remote cache is no harder than running your own caching mirror.
However, in general it seems to me that the proper solution for this is to have a --repository_proxy=PROXY
flag just like we have a --remote_proxy
flag for remote caching / execution. Bazel wouldn't rewrite the URLs but properly proxy them through PROXY
. This is a more generic solution to --repository_mirror
that solves all kinds of additional problems that will pop up eventually (just like they did for remote caching) like authentication, name resolution / service discovery, load balancing etc.
@buchgr, my requirement is that I need to be able to go back in time and fully re-create an artifact in 5 years. That means that I need to properly track all the dependencies that are downloaded and make sure they are going to be available on my timeline, not someone else's timeline. Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.
For cache locality, we've setup a NGINX proxy next to the build machines that DNS resolves to instead of our dependency server. There are enough knobs today to make that all work.
I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.
@philwo , Y'all might have a different desired level of polish, but from my point of view, 1) happy to do by hand, 2) Whichever is easiest. I have other ways of blocking outside access. Flags are cool, but minimum viable feature is fine, 3) It should definately get uploaded to the cache, but the cache will drop the dependency at some point. For final production builds, I'm also required to build locally without a cache. :(
+1 to @AustinSchuh's comment about going back in time. It is an absolute requirement for many organizations to check downloaded dependencies in to their source tree - even if they do not vendor them. Virtually every company building products with long life cycles (e.g. embedded systems, flight control software, factory automation) checks in the compilers and entire build tool chain so that they can patch very old releases of their products. 10-15 year life cycles are not uncommon.
@AustinSchuh
I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.
I believe this problem is typically solved by setting NO_PROXY
but having --repository_proxy
seems good to me, while the --repository_mirror
functionality doesn't. So it looks like we are on the same page?
Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.
We do the same... lots of hackery. I agree that repository dependencies should be added to the cache, but this is not sufficient.
A --repository_proxy
flag seems like it could solve the issue by letting us put the logic behind a service.
@philsc FYI, this was the ticket I was referencing.
I think this feature was addresed by the downloader rewrite feature: https://github.com/bazelbuild/bazel/pull/12170
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.
I'll claim this is still open.
The downloader rewrite feature helps a ton, but, doesn't work for python packages or npm packages.
Why doesn't it work for python packages? I might be wrong, but I thought they just used http_archive.
rules_python currently uses pip to download the packages. I am hoping to get started on a version of the rules that uses http_archive instead.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.
@philsc , do you recall where the state of the art is for rules_python?
Wtih @aignas's work, I think we can fully rehost pip packages. But there's some subtleties around making the rehosted version look like a pypi mirror. Especially since we have custom-compiled wheels. Still working on that part, but I'm 99% sure that no patch for rules_python is needed there.
For reproducibility and control of the lifecycle of our artifacts, we need a way to re-host all our dependencies. Most rules (See https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl for example) fetch dependencies from external domains.
@philwo thought this wasn't crazy.
I'd like to be able to rewrite
https://codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b
tohttp://build-deps.peloton-tech.com/codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b
for example in Bazel. The same needs to work for git repositories and any other URLs.