bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.98k stars 4.03k forks source link

Want a way to re-host dependencies easily #6342

Open AustinSchuh opened 5 years ago

AustinSchuh commented 5 years ago

For reproducibility and control of the lifecycle of our artifacts, we need a way to re-host all our dependencies. Most rules (See https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl for example) fetch dependencies from external domains.

@philwo thought this wasn't crazy.

I'd like to be able to rewrite https://codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b to http://build-deps.peloton-tech.com/codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b for example in Bazel. The same needs to work for git repositories and any other URLs.

dslomov commented 5 years ago

What would be the design for this?

philwo commented 5 years ago

What about bazel build --repository_mirror=https://mirror.bazel.build and that would automatically rewrite URLs like Austin suggested: https://example.com/download.zip to https://mirror.bazel.build/example.com/download.zip (basically just prefix them).

Open questions:

buchgr commented 5 years ago

If it's just about hosting publicly available files somewhere closer to home then I think the solution for this problem is to implement using the remote cache as a repository cache. Both are content addressable and running a remote cache is no harder than running your own caching mirror.

However, in general it seems to me that the proper solution for this is to have a --repository_proxy=PROXY flag just like we have a --remote_proxy flag for remote caching / execution. Bazel wouldn't rewrite the URLs but properly proxy them through PROXY. This is a more generic solution to --repository_mirror that solves all kinds of additional problems that will pop up eventually (just like they did for remote caching) like authentication, name resolution / service discovery, load balancing etc.

AustinSchuh commented 5 years ago

@buchgr, my requirement is that I need to be able to go back in time and fully re-create an artifact in 5 years. That means that I need to properly track all the dependencies that are downloaded and make sure they are going to be available on my timeline, not someone else's timeline. Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

For cache locality, we've setup a NGINX proxy next to the build machines that DNS resolves to instead of our dependency server. There are enough knobs today to make that all work.

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

@philwo , Y'all might have a different desired level of polish, but from my point of view, 1) happy to do by hand, 2) Whichever is easiest. I have other ways of blocking outside access. Flags are cool, but minimum viable feature is fine, 3) It should definately get uploaded to the cache, but the cache will drop the dependency at some point. For final production builds, I'm also required to build locally without a cache. :(

aiuto commented 5 years ago

+1 to @AustinSchuh's comment about going back in time. It is an absolute requirement for many organizations to check downloaded dependencies in to their source tree - even if they do not vendor them. Virtually every company building products with long life cycles (e.g. embedded systems, flight control software, factory automation) checks in the compilers and entire build tool chain so that they can patch very old releases of their products. 10-15 year life cycles are not uncommon.

buchgr commented 5 years ago

@AustinSchuh

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

I believe this problem is typically solved by setting NO_PROXY but having --repository_proxy seems good to me, while the --repository_mirror functionality doesn't. So it looks like we are on the same page?

ob commented 5 years ago

Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

We do the same... lots of hackery. I agree that repository dependencies should be added to the cache, but this is not sufficient.

A --repository_proxy flag seems like it could solve the issue by letting us put the logic behind a service.

AustinSchuh commented 5 years ago

@philsc FYI, this was the ticket I was referencing.

sitaktif commented 3 years ago

I think this feature was addresed by the downloader rewrite feature: https://github.com/bazelbuild/bazel/pull/12170

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

AustinSchuh commented 1 year ago

I'll claim this is still open.

The downloader rewrite feature helps a ton, but, doesn't work for python packages or npm packages.

matts1 commented 1 year ago

Why doesn't it work for python packages? I might be wrong, but I thought they just used http_archive.

philsc commented 1 year ago

rules_python currently uses pip to download the packages. I am hoping to get started on a version of the rules that uses http_archive instead.

github-actions[bot] commented 1 month ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

AustinSchuh commented 1 month ago

@philsc , do you recall where the state of the art is for rules_python?

philsc commented 1 month ago

Wtih @aignas's work, I think we can fully rehost pip packages. But there's some subtleties around making the rehosted version look like a pypi mirror. Especially since we have custom-compiled wheels. Still working on that part, but I'm 99% sure that no patch for rules_python is needed there.