elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.59k stars 24.36k forks source link

Confusing usages of compileOnly vs api in plugin and module projects #84643

Open mark-vieira opened 2 years ago

mark-vieira commented 2 years ago

I'm going through some of our modules and even I don't quite understand what should be compileOnly and what should be api and vice versa. It's likely developers are completely getting this wrong.

This the analysis-common module, it extends painless, so has compileOnly dependency on it. So since this is compileOnly the painless module jar is not bundled with the analysis-common module but I assume some kind of magic in our plugin loading makes painless available in its classloader, I presume via the extendedPlugins mechanism?

Now for the kibana module, it uses the api configuration here, which means that module jar is bundled in the kibana module. Which is correct? Should kibana treat this as compileOnly as well and indicate that it extends the other?

In places where we use api it means we are rebundling that dependency. Taking the the reindex module for example we are actually included it in the distribution 3 times. Once in the reindex module itself, and again in both kibana and x-pack enrich modules. Should the latter be bringing it in as compileOnly and declaring they extend the module or is there a reason for having multiple options here?

Alternatively is there a way we can remove use of the compileOnly configuration here altogether and figure this out for devs implicitly. We know how ES is bundled and which dependencies will be "provided" or not. We should have to have developers do this manually as it's clear it's easy to make mistakes here.

elasticmachine commented 2 years ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticmachine commented 2 years ago

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira commented 2 years ago

My gut feeling is we ditch the use of compileOnly altogether and just have folks use implemtnation as that's a much better description of what they want. Also it avoid having to also add these as testImplementation since compileOnly depenencies aren't brought onto the test classpath, and similarly for anything downstream which complicates thing even more.

The filtering should be done when we bundle the plugin and we can figure this out there. Essentially what should be bundled in the plugin is everything on the runtime classpath - anything provided by the Elasticsearch "platform" at runtime. Right now that's basically :server and its dependencies. Perhaps we can do something similar to what I spiked out for the modularization stuff where what gets bundled is a filtered artifact view.

@rjernst @ChrisHegarty specifically looking to y'all to provide some thoughts here.

rjernst commented 2 years ago

It looks like there are a couple issues/questions you raise here. Let me see if I can summarize them.

  1. compileOnly is used as a way to include code from server (or an extended plugin) that will be available at runtime. This configuration is used to include the dependency on the compileClasspath, but exclude it from the bundled plugin/module. Could we instead specify these dependencies as implementation, and filter out these jars when building the plugin?
  2. We have instances of modules which use other modules by including them (either with implementation or api). This results in several jar files being duplicated within our distribution. Why? Can we utilize the "magic" of extended plugins to eliminate this duplication?

To (1), I think this would be a good simplification, and it would cleanup the test dependencies specification, as you pointed out.

(2) is more complicated.

When we started splitting server out into modules (note: not JPMS modules, at least not yet), the intent was to have a flat hierarchy. There were many reasons for that model being preferred, but the most relevant here was to facilitate isolated sets of code which would be easily testable on their own. A more practical reason was that a single level of plugin class loaders was easy to model and implement.

There were downsides to this approach immediately, which we knew going into it. The most egregious example still exists today: xpack-security includes the transport-netty module. This is because security extends the transport implementation to add SSL, and we would not want to duplicate everything in transport-netty, but instead reuse it. Effectively though we are duplicating it, just at build time, so that we get two copies of the transport-netty jar, as well as all of the netty jars, just in different classloaders. Although the xpack-security and transport-netty modules both include many of the same classes, we don't get jarhell because they are in separate classloaders. This is unfortunate, but they still function independently because there is no interaction between the two modules. If there were, we would get the dreaded "Cannot cast class X to class X".

Extended plugins were added to solve a different problem, where two plugins (or modules) need to interact at runtime with each other. In the case of painless, this is so that other plugins can define or extend the allowed list of classes and methods that are available to scripting contexts. For example, watcher adds some additional scripting contexts, and then adds watcher specific helper methods available to those scripts. In order to do this, the xpack-watcher and painless plugins need a way at runtime to communicate. The solution we came up with was to use SPI as the means to provide the additional functionality from watcher into painless. However, in order for painless to find the extension class defined by watcher, it must know about watcher's classloader at runtime. The extendedPlugins attribute is the way that link is configured. The magic happens when loading plugins, and the hierarchy is no longer flat. This is done, though, in as loose a way as possible, so that watcher can only see the extension classes of painless (this is the separate painless spi jar), and painless gets notified (see ExtensiblePlugin) when another classloader is ready that it should try to load its SPI extensions from.

While the jar duplication issue is a problem, extended plugins are not the solution. That should remain for when two plugins need to actually interact. If two modules or plugins need to share some code, we should look at shared libraries for that. Currently only those depended on by server will only be loaded once, but I hope with the module layers coming out of modularization, as well as the implementation hiding we have shown with x-content, that we will make shared libraries easier to build and use.

Side note: the kibana module doesn't seem to actually need the reindex module (it compiles fine without it). I will open a PR for that.

mark-vieira commented 2 years ago

You mention that classpath visibility across extended plugins is limited but it's not clear to me how exactly. For example, the x-pack spatial module extends the legacy-geo module and declares it compileOnly as expected. It then directly references classes in that module. It would appear to me that the extended module is visible on its classpath, which explains why it's not necessary to bundle a duplicate jar here.

In short, it's not clear to me the kind of particular dependency on another module/plugin that require bundling it, rather than extending it.

rjernst commented 2 years ago

The extended plugins mechanism has been abused, as it is in the spatial/legacy-geo case. But that does not mean that we should advertise it as a general purpose "let's share code across modules".