SAP / cf-java-logging-support

The Java Logging Support for Cloud Foundry supports the creation of structured log messages and the collection of request metrics
Apache License 2.0
77 stars 48 forks source link

Unexpected stall on logback warning #179

Closed blackalkane closed 5 months ago

blackalkane commented 5 months ago

Hi

I have one general question: is the logging mechanism using cf-java-logging-support-logback either writing to file/disk or using a synchronous, blocking approach?

Our service's /health endpoint is stalling for multiple seconds. The code execution analysis showed Disk I/O operation from logger warning. We just use cf-java-logging-support-logback as is.

Thanks

KarstenSchnitter commented 5 months ago

Hi,

thanks for raising this issue. The answer depends on your additional logback configuration.

TL;DR: The library does not add synchronization, but Logback synchronizes writes on OutputStream, most likely stdout in your case.

The provided JsonEncoder is not using any synchronous blocks itself. It does access the MDC, which is a Java ThreadLocal, but this is non-blocking. However, the sample configuration uses the logback ConsoleAppender, which uses a synchronization on stdout. You can work around that, by wrapping the ConsoleAppender in a AsyncAppender:

<configuration debug="false" scan="false">
   <appender name="STDOUT-JSON" class="ch.qos.logback.core.ConsoleAppender">
      <encoder class="com.sap.hcp.cf.logback.encoder.JsonEncoder"/>
   </appender>
   <appender name="ASYNC-JSON" class="ch.qos.logback.classic.AsyncAppenderr">
      <appender-ref ref="STDOUT-JSON" />   </appender>
   </appender>
   <root level="${LOG_ROOT_LEVEL:-WARN}">
      <appender-ref ref="ASYNC-JSON" />
   </root>
</configuration>

Note, that there are several caveats with the AsyncAppender with regards to event loss. Please read up on that on its documentation.

If I am missing something here, please share details of your analysis.

Best Regards, Karsten

jandreas0 commented 5 months ago

Hi @KarstenSchnitter Thank you for this confirmation. Our appender configuration looks like this:

<appender name="STDOUT-JSON"
        class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="com.sap.hcp.cf.logback.encoder.JsonEncoder" />
    </appender>
    <appender name="STDOUT"
        class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            ...
        </encoder>
    </appender>

Would you have any insights/ideas why our Dynatrace logging produces the following trace: log_stack_trace

This, to me, seems to be indicating a write-to-file (using FileOutputStream.writeBytes). This method takes up 100% of the 2.71s of execution time. Dynatrace is indicating that the entire execution time (100%) is spent in "Disk I/O".

KarstenSchnitter commented 5 months ago

The call stack determined by Dynatrace is correct. The ConsoleAppender will use System.out or System.err to get the OutputStream to write to. This is a cascade of different stream types: PrintStream -> BufferedOutputStream -> FileOutputSream. The ultimate method in your call stack FileOutputStream.write is actually a native method. It is no longer executing Java code, but a system library that creates the necessary syscalls.

The ultra-long execution time can be caused by massive load, for example, a huge amount of data written. Another reason might be general congestion of your execution environment, for example, the operating system or the hardware. If you are running in a cloud environment, there might be a noisy neighbour on the same machine. For example in CloudFoundry, there is no I/O limitation for applications. It would be interesting to know, how reproducible your issue is. Do you always see it with your application or just in some running instances? This might give you a lead on finding the root cause of your issue.

Coming back to this library: Your Dynatrace call stack does not contain any reference to cf-java-logging-support. This should have been the case in the first three lines. The encoder is called after filtering the messages and before handing it of to the appenders. Either Dynatrace is not picking this up, or you are running without the library altogether. In both cases, there is no performance issue with cf-java-logging-support. Maybe you can get more help with the JVM provider or your cloud provider, if this is used.