Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.05k forks source link

graylog2-server 0.20.3-1 consumes all memory and hangs #604

Closed ccastillo-contactpoint closed 10 years ago

ccastillo-contactpoint commented 10 years ago

debian wheezy, fully upgraded, graylog2 installed with debian packages elasticsearch = 0.90.10 openjdk-7-jre = 7u55-2.4.7-1~deb7u1 graylog2-server = 0.20.3-1 ec2_instance_type = c3.2xlarge (15 GiB of RAM, no swap space)

/usr/bin/java -Xms8192m -Xmx8192m -jar /usr/share/graylog2-server/graylog2-server.jar -p /var/run/graylog2/graylog2-server.pid -f /etc/graylog2/server/server.conf

root@logs ~ # strace -p 1543 Process 1543 attached - interrupt to quit futex(0x7f34a64089d0, FUTEX_WAIT, 1545, NULL

... process just hangs there, no more output from strace ...

Here's a portion of the log (or at pastebin for better readability http://pastebin.com/ZyRaFNya):

2014-07-01 23:49:31,221 INFO : org.elasticsearch.monitor.jvm - [graylog2-server] [gc][old][98390][568] duration [18.6s], collections [2]/[18.6s], total [18.6s]/[1.6h], memory [6.2gb]->[6.2gb]/[7.6gb], all_pools {[young] [910.9mb]->[910.5mb]/[911mb]}{[survivor] [0b]->[0b] /[910mb]}{[old] [5.3gb]->[5.3gb]/[5.3gb]} 2014-07-01 23:49:31,221 ERROR: org.graylog2.jersey.container.netty.NettyContainer - Uncaught exception during jersey resource handling java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) at org.jboss.netty.channel.Channels.write(Channels.java:725) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:280) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121) at org.jboss.netty.channel.Channels.write(Channels.java:704) at org.jboss.netty.channel.Channels.write(Channels.java:671) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at org.graylog2.jersey.container.netty.NettyContainer$NettyResponseWriter$1.write(NettyContainer.java:118) at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at java.io.BufferedWriter.flush(BufferedWriter.java:254) at org.glassfish.jersey.message.internal.ReaderWriter.writeToAsString(ReaderWriter.java:192) at org.glassfish.jersey.message.internal.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:129) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:99) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:59) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:243) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:230) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:103) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:88) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1139) at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:574) at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:381) at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:371) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:262) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010) at org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:306) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

kroepke commented 10 years ago

also see #597

there seems to be a run-away process, needs investigation

dennisoelkers commented 10 years ago

Thanks for reporting this. Could you please take multiple stack dumps from the process while it is in this hanging state? You can do this by calling "jstack " multiple times and pipe the output to separate files. Being able to dig through these would help us a lot. Thanks!

ccastillo-contactpoint commented 10 years ago

dennisoelkers: sure, next time it hangs I'll produce the stack dumps and send them here. Thanks for your time ;-)

ccastillo-contactpoint commented 10 years ago

Hey dennisoelkers,

We're currently having the graylog2-server not fully crashing like before, but not processing all the messages it's receiving (dropping a lot of messages actually, load of 22 with 8 cores). I tried to produce the dumps with just jstack but got nothing, then tried -l -F flags and got something, then saw the man page said that: "If the given process is running on a 64-bit VM, you may need to specify the -J-d64 option", which I did, and got some output from the process (which goes on the 3 pastebin links below).

http://pastebin.com/DTt8Mc0a http://pastebin.com/bRhnDPyj http://pastebin.com/Fs0wNnrj

root@logs ~ # jps -l 3438 /usr/share/graylog2-server/graylog2-server.jar 3917 play.core.server.NettyServer 27201 sun.tools.jps.Jps 2199 org.elasticsearch.bootstrap.ElasticSearch

root@logs ~ # jstack -l -F 3438 Attaching to process ID 3438, please wait... Debugger attached successfully. Server compiler detected. JVM version is 24.51-b03 Deadlock Detection:

No deadlocks found.

^CException in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) at sun.tools.jstack.JStack.main(JStack.java:102) Caused by: java.lang.RuntimeException: VM.initialize() was not yet called at sun.jvm.hotspot.runtime.VM.getVM(VM.java:398) at sun.jvm.hotspot.oops.Oop.getKlass(Oop.java:92) at sun.jvm.hotspot.oops.ObjectHeap$2.canInclude(ObjectHeap.java:419) at sun.jvm.hotspot.oops.ObjectHeap.iterateLiveRegions(ObjectHeap.java:490) at sun.jvm.hotspot.oops.ObjectHeap.iterateSubtypes(ObjectHeap.java:417) at sun.jvm.hotspot.oops.ObjectHeap.iterateObjectsOfKlass(ObjectHeap.java:260) at sun.jvm.hotspot.runtime.ConcurrentLocksPrinter.fillLocks(ConcurrentLocksPrinter.java:70) at sun.jvm.hotspot.runtime.ConcurrentLocksPrinter.(ConcurrentLocksPrinter.java:36) at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:61) at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) ... 6 more

Thanks !

ccastillo-contactpoint commented 10 years ago

It just completely crashed like before, so I produced another trace. Had to search another pastebin-like service that allows to paste more than 500 kb of text though ...

http://paste.ubuntu.com/7762302/

kroepke commented 10 years ago

Unfortunately I simply cannot reproduce this. So far I've tried:

Both memory usage is stable at under 400MB, garbage collection is not exactly fast, but the system was running with a profiler, which slows everything down. It was still responsive.

Thread count is stable, with the expected changes whenever a input starts or stops. Especially the old generator is very small, all garbage is in the young generation and gets collected right away. This is with a completely untuned garbage collector.

The tests were done on my dev laptop, the graylog2 server had 768MB heap and never used more than 400 of that. Elasticsearch is completely untuned, too, but running locally. The system has a SSD.

What makes me wonder about the heap usage your system reports is that almost all memory is used in the old generation. That would indicate that the system cannot write its messages to elasticsearch. Is that connection flaky for you? Are there any messages in the master caches? Is your elasticsearch system maybe too slow?

Thanks!

ccastillo-contactpoint commented 10 years ago

Hey kroepke,

Everything on that box is running locally, elasticsearch/graylog2-server/graylog2-web/mongodb, the box is an AWS c3.2xlarge (8 cores) which should be ok for our usage. A random dd from /dev/urandom to a temporary file gives a consistent 12 MB/s for writes on that box (3 EBS magnetic volumes, RAID-5, to not lose any data in case of a failure) which is more or less the same as my SATA-2 drive here on my desktop which gives me about 16 MB/s. Box has no swap at all.

I ran a dstat on the box to see if it was disk related, but while I'm seeing peaks of 16 or even 19 MB/s for writes, those are not constant at all.

The graylog2-server process is almost constantly consuming 100% of the 8 cores the c3.2xlarge instance has.

Now to specifically respond to your questions:

1) No connection to elasticsearch, all processes running locally. 2) How can I see if there are any messages in the "master caches" ?. 3) System being slow, could be, but we don't have more than 1000 or 2000 messages / second, this box should be able to handle that load (I think, but correct me if I'm wrong).

Is there any metric, command or anything you want me to run on the box to have a clearer picture of what's going on here ?

Thanks man

ccastillo-contactpoint commented 10 years ago

Found where to get the number of messages in the "master cache" (sorry for the noob question). We're trying with Oracle Java now instead of OpenJDK. I'll comment how things go in the following days (or earlier if it crashes again).

ccastillo-contactpoint commented 10 years ago

Currently having:

2312008 messages in master cache. The JVM is using 5276 of 7923 MB heap space and will not attempt to use more than 7923 MB.

Another error in /var/log/graylog2-server/console.log:

2014-07-24 22:40:10,561 ERROR: org.graylog2.jersey.container.netty.NettyContainer - Uncaught exception during jersey resource handling java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) at org.jboss.netty.channel.Channels.write(Channels.java:725) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:280) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121) at org.jboss.netty.channel.Channels.write(Channels.java:704) at org.jboss.netty.channel.Channels.write(Channels.java:671) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at org.graylog2.jersey.container.netty.NettyContainer$NettyResponseWriter$1.write(NettyContainer.java:118) at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) at java.io.BufferedWriter.write(BufferedWriter.java:230) at org.glassfish.jersey.message.internal.ReaderWriter.writeToAsString(ReaderWriter.java:191) at org.glassfish.jersey.message.internal.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:129) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:99) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:59) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:243) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:230) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:103) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:88) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1139) at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:574) at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:381) at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:371) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:262) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010) at org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:306) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

kroepke commented 10 years ago

Yeah sorry for the noisy exception we will clean this up. It means that a connection has timed out, nothing more. This all indicates that your setup is simply not fast enough processing messages. The next version will spool to disk and thereby avoid running out of memory. Right now the only option would be to see what part is slow. I suspect there might be a slow elasticsearch cluster (maybe add another node) or complex stream rules. On Jul 25, 2014 12:45 AM, "Carlos Castillo" notifications@github.com wrote:

Currently having:

2312008 messages in master cache. The JVM is using 5276 of 7923 MB heap space and will not attempt to use more than 7923 MB.

Another error in /var/log/graylog2-server/console.log:

2014-07-24 22:40:10,561 ERROR: org.graylog2.jersey.container.netty.NettyContainer - Uncaught exception during jersey resource handling java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) at org.jboss.netty.channel.Channels.write(Channels.java:725) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:280) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121) at org.jboss.netty.channel.Channels.write(Channels.java:704) at org.jboss.netty.channel.Channels.write(Channels.java:671) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at org.graylog2.jersey.container.netty.NettyContainer$NettyResponseWriter$1.write(NettyContainer.java:118) at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) at java.io.BufferedWriter.write(BufferedWriter.java:230) at org.glassfish.jersey.message.internal.ReaderWriter.writeToAsString(ReaderWriter.java:191) at org.glassfish.jersey.message.internal.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:129) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:99) at org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:59) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:243) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:230) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:103) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:88) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:149) at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1139) at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:574) at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:381) at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:371) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:262) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010) at org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:306) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-server/issues/604#issuecomment-50088217 .

tgagor commented 10 years ago

I have similar issue on graylog 0.20.6. 1 x graylog2-server (-Xms3584M -Xmx3584M), 1 x rabbitmq, 1 x graylog2-radio (-Xms3584M -Xmx3584M), 3 x elasticsearch - every part is on single VPS).

Normally I have about 500~1000 msg/s (about 10~20% of heap space used) and everything is working fine (empty master cache, low CPU usage). But from time to time (when running deployment, nighty crons, etc) we have hits to 10~15k msg/s for about 15 minutes (maximum I saw was 27k msg/s but for short time). This configuration could handle it for about 2~3 minutes. After that time graylog-server stops responding (radio and rabbit are caching more messages until they have resources). It looks like graylog-server is fetching more messages than could parse and whem memory is exausted it hung (with 100% cpu usage on single core) trying to recover memory with GC (but this never succeeded).

I have many logs like this: http://pastebin.com/5q1F5hgL

I have this problem every night about midnight - let me know if you need more information.

When do you plan to release this disk caching update?

kroepke commented 10 years ago

We are planning to release a beta version within the next few weeks, nightly builds are already available. But I would only use them for really early testing to give feedback.

You are using amqp, could you please check what setting you have for the prefetch count? The default used to be 0, which is unlimited. That could lead to OOM problems. 0.20.6 defaults to less, but I'm not sure what happens with already existing inputs. Best to check the radio input config.

Cheers, Kay On Jul 25, 2014 1:57 PM, "gagor" notifications@github.com wrote:

I have similar issue on graylog 0.20.6. 1 x graylog2-server (-Xms3584M -Xmx3584M), 1 x rabbitmq, 1 x graylog2-radio (-Xms3584M -Xmx3584M), 3 x elasticsearch - every part is on single VPS).

Normally I have about 500~1000 msg/s (about 10~20% of heap space used) and everything is working fine (empty master cache, low CPU usage). But from time to time (when running deployment, nighty crons, etc) we have hits to 10~15k msg/s for about 15 minutes (maximum I saw was 27k msg/s but for short time). This configuration could handle it for about 2~3 minutes. After that time graylog-server stops responding (radio and rabbit are caching more messages until they have resources). It looks like graylog-server is fetching more messages than could parse and whem memory is exausted it hung (with 100% cpu usage on single core) trying to recover memory with GC (but this never succeeded).

I have many logs like this: http://pastebin.com/5q1F5hgL

I have this problem every night about midnight - let me know if you need more information.

When do you plan to release this disk caching update?

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-server/issues/604#issuecomment-50139774 .

tgagor commented 10 years ago

I have prefetch set to 0 - I see that current default is 100. I will change this and let you know if this helped.

tgagor commented 10 years ago

Graylog stayed alive after a weekend, so it's much better.

Thank you.

kroepke commented 10 years ago

Great! Can I consider this being solved then?

ccastillo-contactpoint commented 10 years ago

Those "2312008 messages in master cache" I reported eventually made it into elasticsearch. I'm using Oracle Java now, which gave better results than OpenJDK (it didn't crashed so far at least).

I'll wait for the master-cache-to-disk fix and report at a later time. Will be .deb packages to test that or will I have to go the source route for this ?

Thanks

kroepke commented 10 years ago

We will only supply tar files of the beta at this time, I believe. Unless the community packages are being done of course. Beta is planned for the coming week at the latest.

Best, Kay On Jul 28, 2014 6:31 PM, "Carlos Castillo" notifications@github.com wrote:

Those "2312008 messages in master cache" I reported eventually made it into elasticsearch. I'm using Oracle Java now, which gave better results than OpenJDK (it didn't crashed so far at least).

I'll wait for the master-cache-to-disk fix and report at a later time. Will be .deb packages to test that or will I have to go the source route for this ?

Thanks

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-server/issues/604#issuecomment-50362397 .

ccastillo-contactpoint commented 10 years ago

Ok thanks man.

kroepke commented 10 years ago

looks fixed by using better prefetch count value. please reopen if the problem reappears.

gravis commented 9 years ago

It's weird, we're having this after upgrading from 0.91 to 0.92 (Uncaught exception during jersey resource handling)