Closed roaksoax closed 7 months ago
As reported in https://github.com/jruby/jruby/issues/8061#issuecomment-1908807511 JDK 21 LinkedHashMap
introduce a new method (map
), not present in JDK 17 and that interfere with JRuby map method.
As reported in https://github.com/jruby/jruby/issues/8061#issuecomment-1933009986 the fix will be included in JRuby 9.4.6.0
.
The temporary fix would be to add
java.util.LinkedHashSet.remove_method(:map) rescue nil
in rspec bootstrap script: https://github.com/elastic/logstash/blob/4e98aa811701cb9940984d2b43d62ee81d46c6b0/lib/bootstrap/rspec.rb#L18
JDK 17 introduced the flag G1UsePreventiveGC
to resolve a problem in G1 evacuation where there are a lot of short lived humongous objects (humongous means object occupation bigger than 1/2 of a region size). Discussed in https://tschatzl.github.io/2021/09/16/jdk17-g1-parallel-gc-changes.html the problem consists in 0 objects copied during evacuation phase because the count of such object raised so quickly and there isn't Eden or Survivor regions available to move, so needs a FullGC (that Stop The World) do to in-place compaction.
The flag was introduced to do some preventive unscheduled GC cycles to avoid reach the situation of humongous objects saturate the humongous regions, so essentially to preserve space to copy object during evacuation and avoid a FullGC.
With JDK 20 the flag was deprecated and defaulted to false, with JDK 21 is has been removed.
Elasticsearch data node load a lot of 4MB byte[] chuncks of data to be passed down to ML node(but happens also in other case, not limited only to ML case). This generate a lot of humogous allocations (humongous objects are object with size >= 1/2 of region size), in general a spike in allocations would generate an OOM error in the JVM, but ES is able to protect against it with a circuit breaker, and exactly that showed up with a lot circuit breaker exceptions with the memory stying high insted of getting freed and kept lower thanks to the G1 Preventive Collection
phases.
How ES solved the issue ES is resolving this trying to allocate less humongous objects.
Logstash has some peculiarities:
Queue full case If the queue is full and is limiting the input, then at a certain point the allocation rate is not high, given that the references are in queue and stay there for relatively long periods, likely those objects transition into tenured regions (old generation) and doesn't have any benefit from preventive GCs.
So from this perspective having or not preventive GCs doesn't provide any improvement.
Queue empty and fast consumers In this case the queue is almost full, consumers are able to cope with producers. When allocation rate is high and pipelines queues have enogth space to keep live all the events (big objects >= 2MB), being that there isn't any circuit breaker protection the preventive GCs offer limited relieve, JVM hosting Logstash is destined to go OOM without preemptively limiting the allocation rate.
Also in this case having or not preventive GCs doesn't provide improvements.
Given the discussion above, preventive GCs doesn't play an important role for Logstash memory management.
Used the following pipeline, which is pretty fast and keeps the queue mostly empty:
input {
http {
response_headers => {"Content-Type" => "application/json"}
ecs_compatibility => disabled
}
}
output {
sink {}
}
Created a file of 4MB single line of text.
Run wrk
with following Lua script:
wrk.method = "POST"
local f = io.open("input_sample.txt", "r")
wrk.body = f:read("*all")
wrk --threads 4 --connections 12 -d10m -s wrk_send_file.lua --latency http://localhost:8080
Reopen because inadvertently closed by #15719
Closing this issue since now Logstash will support JDK 21. The discussion to decide whether we make it default its followed on a different thread.
Java 21 is now available and we would like to make it the default for Logstash. However, we need to investigate whether it is possible provided Jruby supports it.
Deprecation list: https://docs.oracle.com/en/java/javase/21/docs/api/deprecated-list.html Dependant tasks:
Depending tasks:
Other Tasks
getId
been deprecated in JDK 19 and replaced bythreadId()
starting from JDK 21