elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.17k stars 3.49k forks source link

Isolate native Java and Ruby/Java mixed plugins classes #14530

Open andsel opened 1 year ago

andsel commented 1 year ago

Problem statement

When Logstash starts, it loads all core's jars and plugin's bundled jars, this creates a flat unique classpath, like in every plain Java application. Due to the extendable nature of Logstash, where a user can install various plugins, the classpath is composed of jars that are not identifiable at compile/build time. In particular, a plugin could bundle a library (name it alib) that's also bundled by another plugin or worst by Logstash core itself. Supposing that the alib shipped with the plugin is a different version from the one shipped by another plugin, and deprecates some methods used by the other client, (plugin or Logstash core) this ends in one of the 2 clients to fail. It could throw a runtime error such as "method not found" or other weird messages, depending on the order of class load, which can't be deterministically predefined.

This evidence a problem of interference, in particular also a problem of visibility. One plugin could use any public class exposed by any other of the components (core or plugins) while that class wasn't intended to be a published API.

Both problem originates from missed modularization and isolation.

Solution

There are available existing technologies to achieve modularization, for example OSGI, but are really dynamic and permit the loading and unloading of modules, plus introduces the concept of services API and implementation that modules can expose. Given that a Logstash plugin is installed with Logstash switched off and that plugins aren't installed and uninstalled during runtime, it's maybe an overkill; this needs to be discussed.

Benefits

Runtime and compile time isolation permit to avoid weird problems during execution, and permit the plugins to evolve independently, without incurring in jars-hell problem; plus clearly defines which is the intended and published API.

yaauie commented 1 year ago

My understanding is that this can be handled by the (currently opt-in) pipeline.plugin_classloaders setting (or corresponding --pipeline.plugin-classloaders command-line flag), although looking at its implementation I am concerned that it doesn't exactly solve the problem:

I like the idea of plugins being self-contained along with their own dependencies, largely because it makes a built artifact much more stable across Logstash releases, only needing to be re-built when our API changes. I would even rather the class loader for plugins not inherit from ours in order to prevent accidental reliance on our internal details, but doing so would almost certainly break existing plugins in the wild.

andsel commented 1 year ago

The PluginClassloader as it states in javadoc

Classloader for Java plugins to isolate their dependencies. Will not load any classes in the co.elastic.logstash or org.logstash packages to avoid the possibility of clashes with classes in Logstash core.

force the plugins to load Logstash's core classes from the app classloader while using it to load from jars that compose the plugin itself. This would permit to different plugins to use difference versions of same library and isolate plugins which defines classes in same path from each other.

However present 2 problems:

I would even rather the class loader for plugins not inherit from ours in order to prevent accidental reliance on our internal details,

At some point the plugins needs to access core classes, so for example logger classes or Logstash API definitions must be accessible and loadable from plugin's class space. We have to decide which classes we want to expose and the enforce that the only thing that JVM provide is the module system.