apache / logging-log4j2

Apache Log4j 2 is a versatile, feature-rich, efficient logging API and backend for Java.
https://logging.apache.org/log4j/2.x/
Apache License 2.0
3.35k stars 1.59k forks source link

Threads contention because of OutputStreamManager.writeBytes #2447

Open bharathchandrathakkallapally-cpi opened 5 months ago

bharathchandrathakkallapally-cpi commented 5 months ago

Description

We are using the RollingFileAppender with below configuration

appender.file.type=RollingFile 
appender.file.name=LOGFILE
appender.file.fileName=${basePath}/out.log
appender.file.filePattern=${basePath}/out.log.%d{yyyy-MM-dd-HH}
appender.file.layout.type=PatternLayout
appender.file.layout.pattern=%d [%t] %-5p %c %x - %m%n
appender.file.policies.type=Policies
appender.file.policies.time.type=TimeBasedTriggeringPolicy
appender.file.policies.time.interval=1
appender.file.policies.time.modulate=true

Our application was crashing suddenly one day, we did not deploy any code changes, and nothing is changed in the environment. After attaching the profiler, we found that a single thread locked onto a resource at OutputStreamManager.writeBytes is blocking more than 100 threads which are trying to log, many more threads are in waiting state, eventually application is crashing. We tried restarting the application multiple times, we tried changing the hardware and also check whether we reached our storage limit or IO limits, nothing worked.

Eventual we fixed it by placing immediateFlush to false and root log level to warn. appender.file.immediateFlush=false

If we change the root log level to info, we are experiencing degraded performance even with immediateFlush false. Why did this happen? why suddenly and does using Async Logging help to get the performance back with info level logging? We are not using Async Appender or Async Logging currently.

Screenshot 2024-04-05 at 3 43 51 PM

Configuration

Version: log4j-core - 2.17.0 log4j-api - 2.17.0 log4j-jcl - 2.17.1 log4j-slf4j-impl - 2.17.1

Operating system: [OS and version] Ubuntu 20.04.2 LTS (Focal Fossa) JDK: [JDK distribution and version] openjdk version "1.8.0_402" OpenJDK Runtime Environment (build 1.8.0_402-8u402-ga-2ubuntu1~20.04-b06) OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

Logs

**Locked Thread**
http-nio-8080-exec-123 
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)

**108 Blocked Threads like below thread**
http-nio-8080-exec-276
Stack Trace is:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
- waiting to lock <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)

Reproduction

We still didn't figured out what caused this issue suddenly.

tcmot commented 5 months ago

I think this is a deadlock.

jvz commented 5 months ago

Does this deadlock coincide with the hourly log file rollover you configured? I wonder if there's a race condition here between incoming log events and the rollover process.

bharathchandrathakkallapally-cpi commented 5 months ago

@tcmot according the thread dump analysis we did, there is no deadlock.

@jvz If it is related to the rollover, this issue should be only seen at the end of an hour when the rollover happens but we have seen threads getting blocked everytime we expose our service to traffic, not just at the end of an hour.

tcmot commented 5 months ago

Are all servers like this? Is there a problem with the hard drive? If there is no deadlock, it is likely that data cannot be written to the hard drive.

@bharathchandrathakkallapally-cpi

bharathchandrathakkallapally-cpi commented 5 months ago

@tcmot yes, all the servers in the cluster are facing this issue. There is no problem with the hard drive or IO, we had the AWS guy checked our quota limits and we even tried provisioning new instances, it didn't worked.

ppkarwasz commented 5 months ago

I think that the crucial information that is missing here, is what is thread http-nio-8080-exec-123 waiting for. The lock on RollingFileManager should be the last one required to log the event.

@bharathchandrathakkallapally-cpi, do you have additional data on the "blocking" thread? The jstack command should print something like this:

"http-nio-8080-exec-123" daemon prio=10 tid=0x00007f3e88448800 nid=0x56f5 waiting for monitor entry [0x00000000472bc000]
bharathchandrathakkallapally-cpi commented 5 months ago

@ppkarwasz here is the data your are asking

"http-nio-8080-exec-123" #2723 daemon prio=5 os_prio=0 tid=0x00007f7e94095800 nid=0x34a8f waiting for monitor entry [0x00007f7ea87c4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
    - locked <0x00000000c5509850> (a org.apache.logging.log4j.core.appender.rolling.RollingFileManager)
    at org.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
    at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
    at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
    at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
    at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
    at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:60)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
    at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
    at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
    at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:542)
    at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:500)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:483)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:388)
    at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
    at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:153)
    at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:376)
    at org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:159)
ppkarwasz commented 5 months ago

@bharathchandrathakkallapally-cpi,

Any idea what kind of object 0x00007f7ea87c4000 might be? Does it appear in the stacktraces of other threads?

bharathchandrathakkallapally-cpi commented 5 months ago

sorry, the object 0x00007f7ea87c4000 only appeared once in the dump, and it is with the "http-nio-8080-exec-123" thread .