elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.5k forks source link

[Maintenance] Update to Bundler 2.4 #14945

Closed andsel closed 1 year ago

andsel commented 1 year ago

With PR #14894 a first try was done but later reverted with #14942 because was generating OOM during ./gradlew generatePluginsVersion.

This problem presented in the past and is mainly due to the way Bundler is used inside the generatePluginsVersion, the plugins are installed one at a time. This is needed to create a descriptor of the versions installed in Logstash, so that the corresponding documentation could be generated. The problems is that inside Bundler there is something that caches instances and keeps growing, so a patch was done, described in details here.

With the update to Bundler 2.4 something internal has changed and the patch is not anymore effective. The incriminated version should be the change between 2.3.23 and 2.3.24, when the DepProxy was removed.

There is some work in #14919 to be used as starting point to fix it.


The PR #14995 reintroduces Bundler 2.4 but also ships a workaround to avoid the logstash-input-cloudwatch and build clean bundler's Dsl internal caches.

This results to be a Bundler issue tracked in https://github.com/rubygems/rubygems/issues/6601

andsel commented 1 year ago

cc @roaksoax @jsvd

andsel commented 1 year ago

With starting from Bundler version 2.3.24 the class DepProxy was removed (with PR https://github.com/rubygems/rubygems/pull/5698) so tested applying also the changes proposed in #14919. Testing both with 2.3.24 and 2.3.26 the

./gradlew generatePluginsVersion

terminates, after long time (~45 min on local machine), but without OOM.

Starting from version 2.4 Bundler switched the resolution engine (from Molinillo to PubGrub https://github.com/rubygems/rubygems/pull/5960) and running with latest (2.4.9 at the time of writing) the task or lasts forever or crashes with OOM.

andsel commented 1 year ago

tracking_leak_1

Running the JRuby with reify

./gradlew -Djruby.reify.classes=true clean generatePluginsVersion

and analyzing the dump, seems that a lot of of PubGrub::Term instances are kept alive from the chain:

This has to be proven by calculating the retained size of VersionSolver instaces and understand where and why they are cached.

andsel commented 1 year ago

Using Bundler 2.4.10 with following Gemfile resulted consistently in OOM:

source "https://rubygems.org"
gem "logstash-integration-aws"
gem "logstash-input-cloudwatch"

After installed bundler 2.4.10 with

gem install bundler -v 2.4.10

and run with debug on:

DEBUG_RESOLVER=1 bundle _2.4.10_ install --verbose

it turn out that after backtracking a lot, for each version of aws-sdk-resources gems from 2.11.263 down to 2.0.23

conflict: aws-sdk-resources >= 2.11.632, < 3.0.0 depends on aws-sdk-core = 2.11.632
backtracking to 5
derived: not aws-sdk-resources >= 2.11.632, < 3.0.0
.
.
.
conflict: aws-sdk-resources < 2.0.23 depends on aws-sdk-core = 2.0.22
backtracking to 5
derived: not aws-sdk-resources < 2.0.23

resolving into a resolution conflict, the log remains in infinite loop, printing

! not logstash-mixin-aws >= 0.1.7 is partially satisfied by not logstash-mixin-aws < 0.1.7
! which is caused by logstash-mixin-aws < 0.1.7 depends on logstash >= 1.4.0, < 2.0.0
! thus logstash-input-cloudwatch < 2.1.0 requires logstash >= 1.4.0, < 2.0.0 or logstash-mixin-aws >= 0.1.7

These logs comes from https://github.com/rubygems/rubygems/blob/b2fe65b131493dcd22be49ea060596c206d269e9/bundler/lib/bundler/vendor/pub_grub/lib/pub_grub/version_solver.rb#L227-L229 and the motivation while it keeps locked into the loop and memory continues increasing, is due to

while !incompatibility.failure?
  ...
  incompatibility = Incompatibility.new(new_terms, cause: Incompatibility::ConflictCause.new(incompatibility, most_recent_satisfier.cause))
  ...
  logger.info { "! #{most_recent_term} is#{partially} satisfied by #{most_recent_satisfier.term}" }
  logger.info { "! which is caused by #{most_recent_satisfier.cause}" }
  logger.info { "! thus #{incompatibility}" }
end  

The incompatibility instance created has failed? that evaluates always to true while the Incompatibility instances are kept alive by new_terms variable, and this made the OutOfMemory error.

Looking into the definition of failed? at https://github.com/rubygems/rubygems/blob/b2fe65b131493dcd22be49ea060596c206d269e9/bundler/lib/bundler/vendor/pub_grub/lib/pub_grub/incompatibility.rb#L36

def failure?
  terms.empty? || (terms.length == 1 && Package.root?(terms[0].package) && terms[0].positive?)
end

it turns out that:

roaksoax commented 1 year ago

Closing as bundler 2.4 is now used in Logstash 8.8. See https://github.com/elastic/logstash/issues/15003 for follow-up issues that need to be resolved once bundler releases a new version.