Open mattfield opened 6 years ago
@mattfield, sorry for the problems you've experienced here. A significant number of PQ bugs were fixed for the 6.3.0 release and we've had no reports of PQ problems on that or later releases so I'd strongly recommend the 6.3.0 or later release if you're using PQ.
Hi @danhermann. We're running 6.3.2 and I just tried recreating our logstash containers which resulted in the same Page file size is too small to hold elements
error message. Any thoughts?
Saw that same message again on our 6.3.2
staging logstash servers 😞
This also happened in staging which is running LS & ES 6.6.2:
{"level":"ERROR","loggerName":"org.logstash.execution.AbstractPipelineExt","timeMillis":1557186424796,"thread":"Converge PipelineAction::Create<main>","logEvent":{"message":"Logstash failed to create queue."}}
{"level":"ERROR","loggerName":"logstash.agent","timeMillis":1557186424816,"thread":"Converge PipelineAction::Create<main>","logEvent":{"message":"Failed to execute action","action":{"metaClass":{"metaClass":{"metaClass":{"action":"PipelineAction::Create<main>","exception":"Java::JavaLang::IllegalStateException","message":"java.io.IOException: Page file size is too small to hold elements","backtrace":["org.logstash.execution.AbstractPipelineExt.openQueue(AbstractPipelineExt.java:170)","org.logstash.execution.AbstractPipelineExt$INVOKER$i$0$0$openQueue.call(AbstractPipelineExt$INVOKER$i$0$0$openQueue.gen)","org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:737)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$method$initialize$0(/app/logstash/logstash-core/lib/logstash/pipeline.rb:91)","org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)","org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)","org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:298)","org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:79)","org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:83)","org.jruby.RubyClass.newInstance(RubyClass.java:1022)","org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$block$execute$1(/app/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:43)","org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)","org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)","org.jruby.runtime.Block.call(Block.java:124)","org.jruby.RubyProc.call(RubyProc.java:289)","org.jruby.RubyProc.call19(RubyProc.java:273)","org.jruby.RubyProc$INVOKER$i$0$0$call19.call(RubyProc$INVOKER$i$0$0$call19.gen)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.agent.RUBY$block$exclusive$1(/app/logstash/logstash-core/lib/logstash/agent.rb:94)","org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)","org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:80)","org.jruby.runtime.Block.yieldSpecific(Block.java:134)","org.jruby.ext.thread.Mutex.synchronize(Mutex.java:148)","org.jruby.ext.thread.Mutex$INVOKER$i$0$0$synchronize.call(Mutex$INVOKER$i$0$0$synchronize.gen)","org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:498)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.agent.RUBY$method$exclusive$0(/app/logstash/logstash-core/lib/logstash/agent.rb:94)","app.logstash.logstash_minus_core.lib.logstash.agent.RUBY$method$exclusive$0$__VARARGS__(/app/logstash/logstash-core/lib/logstash/agent.rb)","org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)","org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$method$execute$0(/app/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:39)","app.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$method$execute$0$__VARARGS__(/app/logstash/logstash-core/lib/logstash/pipeline_action/create.rb)","org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)","org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)","org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)","app.logstash.logstash_minus_core.lib.logstash.agent.RUBY$block$converge_state$2(/app/logstash/logstash-core/lib/logstash/agent.rb:327)","org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)","org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)","org.jruby.runtime.Block.call(Block.java:124)","org.jruby.RubyProc.call(RubyProc.java:289)","org.jruby.RubyProc.call(RubyProc.java:246)","org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104)","java.lang.Thread.run(Thread.java:748)"]}}}}}}
{"level":"ERROR","loggerName":"logstash.agent","timeMillis":1557186424846,"thread":"LogStash::Runner","logEvent":{"message":"An exception happened when converging configuration","exception":{"metaClass":{"metaClass":{"exception":"LogStash::Error","message":"Don't know how to handle `Java::JavaLang::IllegalStateException` for `PipelineAction::Create<main>`","backtrace":["org/logstash/execution/ConvergeResultExt.java:103:in `create'","org/logstash/execution/ConvergeResultExt.java:34:in `add'","/app/logstash/logstash-core/lib/logstash/agent.rb:340:in `block in converge_state'"]}}}}}
Related issue: https://github.com/elastic/logstash/issues/8480
It looks like we might have been bitten by https://github.com/elastic/logstash/issues/9483 in our running of Logstash ingest Filebeat data.
We noticed that all ingestion into one of our aggregation clusters had ceased yesterday evening. Looking first at our fleet of Beats, all that we checked were showing the same error:
After verifying that no config changes had been made to either Beats or Logstash, we checked the events and queue details of one of our Logstash boxes, which had been active up to a certain point, but had no events in their PQ:
Their destination ES cluster had been throwing the following, though when checked all it's thread pools were empty:
When we restarted one of the Logstash processes, we saw the following error:
which led us to https://github.com/elastic/logstash/issues/9483 and https://github.com/elastic/logstash/issues/9220. Taking advice from the former, we tried emptying out the contents of our queue directories, which seemed to at least get the processes back to functional and stable. However, when restarting multiple other boxes in the fleet, they went into restart loops, throwing the following:
This then led us to https://github.com/elastic/logstash/issues/8098. In the end, we had to terminate and rebuild our EC2 boxes to get the fleet back to operational.
Any additional insight would be great here.