elastic / apm-agent-java

https://www.elastic.co/guide/en/apm/agent/java/current/index.html
Apache License 2.0
567 stars 320 forks source link

Race-Condition when mocking ReporterConfiguration with real tracer #2842

Closed JonasKunz closed 2 years ago

JonasKunz commented 2 years ago

Test method

All tests which use MockTracer.createRealTracer(reporter) and do mocking of the ReporterConfiguration are affected.

The concrete example from the logs would be all tests from MicrometerMetricsReporterTest.

Test logs

Click to expand ``` 15:23:58 [ERROR] Tests run: 27, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.221 s <<< FAILURE! - in co.elastic.apm.agent.micrometer.MicrometerMetricsReporterTest 15:23:58 [ERROR] testExclusionOfFailedGauge_singleGauge Time elapsed: 0.217 s <<< ERROR! 15:23:58 org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 15:23:58 15:23:58 Long cannot be returned by getApiKey() 15:23:58 getApiKey() should return String 15:23:58 *** 15:23:58 If you're unsure why you're getting above error read on. 15:23:58 Due to the nature of the syntax above problem might occur because: 15:23:58 1. This exception *might* occur in wrongly written multi-threaded tests. 15:23:58 Please refer to Mockito FAQ on limitations of concurrency testing. 15:23:58 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 15:23:58 - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method. 15:23:58 15:23:58 at co.elastic.apm.agent.micrometer.MicrometerMetricsReporterTest.setUp(MicrometerMetricsReporterTest.java:76) 15:23:58 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 15:23:58 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 15:23:58 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 15:23:58 at java.base/java.lang.reflect.Method.invoke(Method.java:566) 15:23:58 at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) 15:23:58 at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) 15:23:58 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) 15:23:58 at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) 15:23:58 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:128) 15:23:58 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeEachMethod(TimeoutExtension.java:78) 15:23:58 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) 15:23:58 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) 15:23:58 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) 15:23:58 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) 15:23:58 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) 15:23:58 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) 15:23:58 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) 15:23:58 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) 15:23:58 at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeMethodInExtensionContext(ClassBasedTestDescriptor.java:520) 15:23:58 at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$synthesizeBeforeEachMethodAdapter$23(ClassBasedTestDescriptor.java:505) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeEachMethods$3(TestMethodTestDescriptor.java:174) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$6(TestMethodTestDescriptor.java:202) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:202) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeEachMethods(TestMethodTestDescriptor.java:171) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:134) 15:23:58 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) 15:23:58 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) 15:23:58 at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) 15:23:58 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) 15:23:58 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) 15:23:58 at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) 15:23:58 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) 15:23:58 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) 15:23:58 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) 15:23:58 at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) 15:23:58 at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35) 15:23:58 at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57) 15:23:58 at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54) 15:23:58 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220) 15:23:58 at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188) 15:23:58 at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202) 15:23:58 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181) 15:23:58 at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128) 15:23:58 at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:150) 15:23:58 at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:124) 15:23:58 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) 15:23:58 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) 15:23:58 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 15:23:58 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) ```

Additional context

The root cause seems to be a race-condition within Mockito, even when using the doReturn-syntax.

In this concrete example the problem is caused by these two lines :

tracer = MockTracer.createRealTracer(reporter);
doReturn(0L).when(tracer.getConfig(ReporterConfiguration.class)).getMetricsIntervalMs();

The race-condition here is that MockTracer.createRealTracer(reporter) starts at least one background thread which reads from the ReporterConfiguration, for example the ApmServerHealthChecker while we at the same time try to configure the mocks.

The ApmServerHealthChecker creates a background request to the APM-server on a different Thread, where it reads the configuration from the ReporterConfiguration (this line):

String apiKey = reporterConfiguration.getApiKey();

The race-condition occurs when the getApiKey() call is performed before the getMetricsIntervalMs() call, but after doReturn(0L).when(tracer.getConfig(ReporterConfiguration.class)). In this case Mockito thinks that we are trying to stub getApiKey().

I have the suspicion that the when() part of mockito is actually not thread safe for our case.

felixbarny commented 2 years ago

The race-condition here is that MockTracer.createRealTracer(reporter) starts at least one background thread which reads from the ReporterConfiguration, for example the ApmServerHealthChecker while we at the same time try to configure the mocks.

This is a long standing source of flakiness. Kudos for finding out the root cause 👏

Cloud we disable the background tasks, such as health checking within MockTracer.createRealTracer? Another option would be to block inside MockTracer.createRealTracer until the background tasks are finished.

JonasKunz commented 2 years ago

This is a long standing source of flakiness

In that case I will invest the time and try implementing your suggested fixes.

JonasKunz commented 2 years ago

In case of a regression, this specific issue can be reproduced reliably via the following steps:

  try {
      Thread.sleep(50);
  } catch (InterruptedException e) {
      throw new RuntimeException(e);
  }
  String secretToken = reporterConfiguration.getSecretToken();