Closed kchodorow closed 5 years ago
Added a proposal sorta related to this in #1733. Feel free to close it out and schlep it into this issue.
:+1: for adding transitive deps to this. is there any technical reason that we're aware of that we haven't made transitive deps work yet?
It depends on what you mean by transitive deps working. The biggest problem right now I feel is that maven_jar doesn't let one define the dependency relationships. I've fixed this in the java_import_external repository rule which I'll be contributing to Bazel shortly.
I've also built a web GUI which I'm currently seeking approval to launch which will make it easy for users to generate configurations for this rule. The web GUI will read the pom.xml files from the Maven server, resolve transitive and diamond dependencies, and create code that shows you exactly what's going into your project. I feel like this is the best direction for Bazel. It leads to much faster builds which are actually hermetically sealed without magic.
By transitive deps working, I mean the rule fetching the dependency relationships from the Maven server and not requiring the developer to specify them.
In order to do that in a repository rule, it would probably be necessary to have the rule shade all the transitive jars into the root jar. That means rewriting the transitive class names, rewriting the byte code, and then the code size increases quadratically.
Hmmm, I'm not sure I follow. Why would it be necessary to shade the transitive jars and rewrite class names? Perhaps I'm missing something, but the way I would expect it to work would be:
exports
. These targets would be something like @somemavenjar//jar:dep_on_guava_21.0
.@somemavenjar//jar
which exports all of its dependencies.I don't know how to do 1, but I assume there must be a way since other build tools do this.
Having a single remote repository for all the maven jars required by the project, and each individual jar being its own rule within the repository, would avoid the need for shading. E.g. @closure_rules_maven_jars//:com_google_guava
. Shading is only necessary if you want to have the same behavior as maven_jar where jars have a 1:1 mapping with repository names.
But doing things that way introduces another problem. What if another Bazel project depends on that Bazel project? It would have to adopt @closure_rules_maven_jars
as its container for all its jars, and then redefine the whole thing, in order to put its own jars in there. If it doesn't do that, then we end up with quadratic dependencies again.
There's a lot of value to not fetching transitive dependencies auto-magically. For example, with the web gui I just wrote, I generated the following config for com.google.template:soy:2016-08-25
. In doing so, I was able to identify a bug in com_google_common_html_types
which is depending on Guava Testing Library without declaring it as a test scoped dependency. I was also able to audit the licenses of all my transitive dependencies very easily. But most importantly, by using this config, builds are going to go insanely fast for my users, because calculating that config required downloading 150 things, e.g. pom.xml files. Furthermore, I'm able to effectively mirror my dependencies so builds can be durable and never break.
@jart The web gui sounds awesome. Are you close to open sourcing it?
Expect it at some point in the upcoming months. I need to go through the process. I've also got a lot of other stuff on my plate with TensorFlow.
FWIW: Gerrit Code Review project created own version of maven_jar
Skylark rule and extracted it to bazlets repository: [1]. It does not use mvn
, though.
Do any of the skylark maven rules work on Windows?
Bazel has a skylark maven_jar rule which uses mvn. Isn't what this ticket is about? Is it open as an aggregate to all the missing features? As in, we have something but it's not mature enough?
Is there a benchmark between the native and skylark versions? Sounds like spawning a new mvn per jar can be really expensive when talking about a repo with hundreds or thousands of external dependencies
java_import_external is native and will download jars as fast as your internet connection goes. Kristina and I spent a lot of time designing Bazel's native downloader for scalability and 99.9% reliability. For example, bazel fetch on this configuration with 59 downloads happens in four seconds.
@jart I might have misunderstood something but the skylar maven_jar version does not use Bazel's native downloader, right? It uses mvn. Are you simply pointing me to a more robust alternative which you trust? In any case I appreciate you taking the time :)
maven_rules.bzl farms out downloads to the system mvn
command, as you pointed out. native.maven_jar
farms out downloads to some third party java library (see MavenDownloader) which is much faster than running the mvn
command, but not as robust as Bazel's downloader.
In order to benefit from Bazel's highly advanced downloader, you have to call repository_ctx.download
or repository_ctx.download_and_extract
in Skylark, or use any of the native workspace rules with the exception of http_jar and maven_jar.
ok, that was the missing piece. Thanks!
On Thu, Jun 15, 2017 at 9:38 AM Justine Tunney notifications@github.com wrote:
maven_rules.bzl https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/maven_rules.bzl farms out downloads to the system mvn command, as you pointed out. native.maven_jar farms out downloads to some third party java library (see MavenDownloader https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/repository/MavenDownloader.java) which is much faster than running the mvn command, but not as robust as Bazel's downloader.
In order to benefit from Bazel's highly advanced downloader, you have to call repository_ctx.download or repository_ctx.download_and_extract in Skylark, or use any of the native workspace rules with the exception of http_jar and maven_jar.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1410#issuecomment-308642864, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF5Dsp1y9_XaWsoHNdWMO60caYct1ks5sENF1gaJpZM4I2X8P .
One important aspect for us is not only accelerate the download process and/or make it more robust, but to try very hard to avoid the download and safe network bandwidth in the first place. Gerrit Code Review project has a lot (ca. 150) third party dependencies. There are also more than 100 Gerrit plugins. If you will build all the dependent projects and all plugins, or even if you would clone stable branches in own project directories, you would end up with all currently available Bazel's maven_jar
incarnations fetching the same artifacts into different locations hundreds of times!
The only exception is to use gerrit's own maven_jar
, that was originally written by Shawn (Gerrit Code Review maintainer) as Gerrit used Buck and was a straight forward port to Bazel Skylark rule. We are staging all (1) downloaded artifacts into ~/.gerritcodereview
directory and hard link them to the Bazel's project location. Yes, we do that using curl
from a Python script. But doing this for the last 5 years, we've never had any issues with it. As the consequence, if 100 clones of same or different projects use version 42 of artifact A, it would be downloaded only one time and never again.
[1] Unfortunately this is not true any more, since we depend on Bazel's closure rules, we lost the ability using the staging directory feature for 100% of our dependencies. That's because closure rules depends on java_import_external
that we do not control. See this commit for more context and background.
Bazel's downloader supports the HTTP_PROXY environment variable. Just set that to a Squid proxy running on your network and you're good to go.
As this ticket is getting a bit long and convoluted, here's a summary of the state of the maven_jar:
There currently exists several options for maven_jar:
Things left to do:
Thank you for the support. Note for our readers: @foo//:foo
can be written as @foo
and java_import_external creates a @foo//jar
alias.
Great recap, thanks!
Any thoughts of bundling @jart's version in Bazel or in a smaller repo? Not sure I want to depend on rules_closure only for this
On Mon, 7 Aug 2017 at 22:44 Kristina notifications@github.com wrote:
As this ticket is getting a bit long and convoluted, here's a summary of the state of the maven_jar:
There currently exists several options for maven_jar:
- The native maven_jar rule. This does not support auth and uses Maven's own libraries to download jars, which are not quite as reliable nor cachable as the other options.
- The @bazel_tools//build_defs/repo/maven_rules.bzl rule that @jin https://github.com/jin implemented. Downside is that it spawns one Maven process per maven_jar rule, which can be very slow. Pros are that it uses Maven directly, so it respects auth/proxy settings you have on your system.
- @jart https://github.com/jart's java_import_external rule: much more flexible than any other option (look at all these attributes https://github.com/bazelbuild/rules_closure/blob/master/closure/private/java_import_external.bzl#L106-L121 and uses the multiplexing downloader @jart https://github.com/jart wrote to be fast and reliable. Downsides are that it won't pick up on system auth settings and it uses a different naming scheme than the others ( @foo//:foo, if I recall correctly, instead of @foo//jar). I recommend using this one, if possible.
Things left to do:
- Download src jars in 1 & 2.
- Add an option for downloading the docs jars.
- Add sha256 as an checksumming option.
- Support auth in 1 & 3.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1410#issuecomment-320761024, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF6sj2RjdMLlLIdcErqSrE0cqTqoCks5sV2kpgaJpZM4I2X8P .
Thanks for the feedback, updated the comment to remove the note about the different naming and added a mention that mvn has to be installed for #2.
@jart actually had a CL adding it to Bazel but I don't think it was ever submitted.
Only option 2 supports AAR files. It would be great if whatever solution we settled on supported arbitrary artifact packaging types.
@kchodorow I've mailed you a changelist adding java_import_external to Bazel. The community should be able to expect it soon. I've also added very helpful documentation with examples.
Speaking as a Bazel newbie, presenting multiple solutions for maven migration is very confusing.
A single, well supported, documented and "official" maven migration solution would be really nice, and I think is key for driving bazel adoption for Java projects.
@jart we (scala people) have a need to be able to turn off ijar creation for some external jars. A current ad hoc solution is to use the native maven_jar and a custom scala_import which uses the file instead of the java_library. Will it be possible to support disabling ijars on specific cases?
If the Bazel authors add an attribute to java_import
that turns off ijar creation, then java_import_external
will absolutely be updated, since the latter is basically the same rule with some urls attributes added.
Thanks! @kchodorow are you the right person to ask?
@jart did you end up adding java_import_external
to somewhere in Bazel?
@cgrushko Indeed I did. It was added to the Bazel codebase 28 days ago in https://github.com/bazelbuild/bazel/commit/062fe70189fc622285833311d241021be313680b. Judging by the baseline, it doesn't look like it made it into 0.5.4, but it's certain to make it into the next one. I hope you enjoy this rule. Usage examples can be found in Closure Rules, Nomulus, and many other places.
@jart Does that rule support authentication to a private maven repo (Artifactory in our case)?
If not, any ETA?
Hey @jart
Rumor has it that you also created some gui tool for converting maven coordinates to java_import_external
. Is it open sourced? We'd love to check it out!
Any news regarding that web tool you've been mentioning in other bugs @jart I kind of want to migrate my repo to java_import_external, but without something like generate_workspace to resolve transitive dependencies, it's quite a lot of work.
Behold Bazel Maven Config Generator in https://github.com/bazelbuild/bazel/pull/3946 and the demo video on YouTube. @or-shachar @StephenAmar
From a quick glance of the above PR, it looks like this does not support private Maven repos such as Artifactory?
@wstrange I don't see why it wouldn't. It also depends on what you mean. For example, you can just sed "repo1.maven.org" in index.html to whatever and it'll crawl the POMs. If you want to it to be able to crawl multiple POM repos, that might not be a trivial change.
Also keep in mind that java_import_external has no awareness of POM metadata. It just grabs jars from whatever URL. I'm also pretty sure Bazel's downloader can do HTTP auth using environment variables. See ProxyHelper.java. It's also probably possible to put the user:pass in the URLs itself, although you might not want to check that into your codebase.
It's also worth mentioning that Google Drive mirroring feature sort of magically and painlessly creates your own private Maven server on the fly. Although it just mirrors the JARs since that's all java_import_external needs.
[Disclaimer: I am a Bazel newbie, so the questions I am asking may not make sense ;-) ]
The way our Artifactory repo works is that there could be several different repos defined, and each has a potentially different set of credentials. So the http auth credentials used by java_import_external would vary depending on which repo the dependency is coming from.
Maven handles all of this by using the credentials defined in ~/.m2/settings.xml. It is not clear to me how to accomplish the same thing with Bazel.
Is Artifactory sort of like a really robust Squid caching proxy? Reading about it, I couldn't help but notice that Artifactory Enterprise Edition offers five-nines availability. I actually have a great deal of respect for the JFrog developers, for having achieving this level of reliability. It's a level of engineering most thought only AT&T and Chubby could master. Even Google Cloud Storage, with its transcontinental redundancy, is only able to promise three-nines. However java_import_external
can actually deliver Erlang reliability. If the urls=[...]
attribute has mirrors to three three-nine CDNs then you get nine-nines availability ((1-(1-99.9/100)*3)100=99.9999999.) If Jesus Christ used Bazel then there'd be about 63 seconds thence when builds could break on downloads. But if we consider that Bazel retries failed requests with exponential backoff for longer than that, then the reliability that spans the ages actually transcends nines and becomes 100. Bazel Community Edition can offer you this incredible level of value, not just for the low-low price of $29,500/year. No my friends, in fact, it doesn't even cost $14,750. You can have it all for the bargain basement price of zero dollars. Yes ladies and gentlemen it's free, and the source code comes included.
But it might need improvement when it comes to that private authentication use case. It's one I haven't considered, because I mostly do open source stuff. Also internally at Google we just vendor everything in our monolithic repo.
One thing you could do is put this in your zone:
$TTL 0
artifacts IN A 192.168.10.4
IN A 192.168.10.5
IN A 192.168.10.6
Put this on your servers:
import BaseHTTPServer
import SocketServer
import base64
import httplib
import shutil
import urlparse
basic = lambda u,p: 'Basic %s' % base64.b64encode('%s:%s' % (u,p))
AUTHORIZATIONS = {
'maven.initech.com': basic('aladdin', 'opensesame'),
'maven.vendoro.com': basic('aladdin', 'opensesame'),
'localhost:5000': basic('aladdin', 'opensesame'),
}
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
def go(self):
ru = urlparse.urlparse(self.path)
pu = urlparse.ParseResult('', '', ru.path, ru.params, ru.query, ru.fragment)
auth = AUTHORIZATIONS.get(str(ru.netloc))
if auth:
self.headers['Authorization'] = auth
self.headers['Host'] = ru.netloc
if ru.scheme == 'https':
c = httplib.HTTPSConnection(ru.netloc)
else:
c = httplib.HTTPConnection(ru.netloc)
try:
c.putrequest(self.command, pu.geturl())
for k, v in self.headers.items():
c.putheader(k, v)
c.endheaders()
r = c.getresponse()
self.send_response(r.status)
for k, v in r.getheaders():
self.send_header(k, v)
self.end_headers()
shutil.copyfileobj(r, self.wfile)
self.wfile.flush()
finally:
c.close()
do_GET = go
do_HEAD = go
class ThreadedHTTPServer(SocketServer.ThreadingMixIn,
BaseHTTPServer.HTTPServer):
daemon_threads = True
ThreadedHTTPServer(('', 4000), Handler).serve_forever()
Then run Bazel like this:
$ HTTP_PROXY=http://artifacts:4000 bazel build //...
And you should be good.
So I think what you are saying is that when you are at 10 nines of availability, you have no place to go. Bazel goes to 11 nines.
Artifactory and Nexus are very common in the "enterprise" space. If Bazel is to attract hordes of Java developers (and that may not be a goal ;-) ), having first class support for private maven repositories (with authentication) is essential.
The proxy idea is super creative (I really appreciate you taking the time to put together a solution). I'll review it - but I think it will be a non starter in my organization. The solution has to be integrated and out of the box.
I return to looking at Bazel every 6 months or so, because we desperately need something like it (maven build and test times are getting absurd). But I have to sell this internally, and the maven migration experience is just not there yet. I'll be back though ;-)
Hi Warren. I'd encourage you to file an issue on rules_maven. It uses gradle to resolve transitive deps under the hood. As gradle already factors in the settings.xml
file when fetching artifacts, I'd gander a bet that getting this to work might not be too hard. We'd just have to be able to pass in your settings.xml
file as a label to the maven_repository rule such that it can be discovered. It may also require some tweaking of the repositories
attribute that maps GROUP:NAME patterns to the (artifactory) url where those artifacts can be found.
@wstrange I encourage you to file a feature request asking for the ability to add to say fetch --auth user:pass@user.com
in ~/.bazelrc
so downloader can do Basic Authentication (see also). It's not an unreasonable thing to ask, and wouldn't be difficult to implement. But there's the proxy solution in the interim.
I can't speak for the Bazel team or Google, but I'm sure they want nothing more than the largest number of people to benefit from Bazel as possible. While we're in the business of sharing world-class technology, we can't always be in the business of solutions, and some assembly is required. I think that's OK, because it creates opportunities for entrepreneurs to build those turn-key solutions on top of the work we're sharing.
For example, nothing would make me happier than to see someone come along, take that Apps Script I posted a few comments ago, and get rich turning it into a business. If that ends up being one of you, buy me a drink next time you're in the Bay Area.
@jart Thanks a lot for the config generator. It was very useful.
A tricky question for you though. I'm having a lot of trouble using extra_build_file_content because I can't seem to be able to use non native rules there (like a rule to shade libraries, or scala specific rules).
Any ideas?
I would advise against doing anything nontrivial in extra_build_file_content
. You can probably do it in your main repo build files. Otherwise, you might be able to load() the appropriate skylark rules, possibly using "@//..." syntax to reference the main repo.
All such feature requests now belong in https://github.com/bazelbuild/rules_jvm_external
Some FRs that have come up in the past: