DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
580 stars 288 forks source link

TestNG CI Visibility onTestFinish event failure #5562

Closed EdwardOlmos closed 1 year ago

EdwardOlmos commented 1 year ago

Tested Versions: Java: 11, 17 TestNG: 7.50 dd-java-agent.jar: 1.17.0

Problem Statement Occasionally the following exception error happens when a test case finishes (4 out of 2000 tests). These tests are running in parallel at the suite level. The Datadog CI Visibility isn't uploading these test case results. However, my TestNG testng-results.xml is parsing the failures correctly.

The error I'm getting is similar to https://github.com/DataDog/dd-trace-java/issues/5268. But this is happening onTestFinish instead of onTestSuiteFinish.

<!DOCTYPE suite SYSTEM "https://testng.org/testng-1.0.dtd" >

<suite name="Suite Name" configfailurepolicy="continue" parallel="tests" thread-count="2" data-provider-thread-count="3" >
    <test name="Test 1" parallel="methods" thread-count="4" >
        <classes>
            <class name="class1"/>
            <class name="class2"/>
        </classes>
    </test>

    <test name="Test 2" parallel="methods" thread-count="4" >
        <classes>
            <class name="class3"/>
        </classes>
    </test>

    <test name="Test 3" parallel="methods" thread-count="4" >
        <classes>
            <class name="class4"/>
        </classes>
    </test>
</suite>
java.lang.IllegalStateException: Active scope does not correspond to the finished test, it is possible that end() was called multiple times or an operation that was started by the test is still in progress; active scope span is: DDSpan [ t_id=7998298566848127145, s_id=8761638396407781800, p_id=0 ] trace=* tags={_dd.profiling.enabled=0, _dd.trace_span_attribute_schema=0, _sample_rate=1, ci.workspace_path=*, component=testng, env=local, git.branch=*, git.commit.author.date=2023-07-10T15:21:01.000Z, git.commit.author.email=*, git.commit.author.name=*, git.commit.committer.date=2023-07-10T15:21:01.000Z, git.commit.committer.email=*, git.commit.committer.name=*, git.commit.message=*, git.commit.sha=*, git.repository_url=*, language=jvm, os.architecture=amd64, os.platform=Linux, os.version=4.14.311-233.529.amzn2.x86_64, process_id=10702, runtime-id=c67cd2b0-6626-4816-b27f-d42892e0a6d2, runtime.name=OpenJDK Runtime Environment, runtime.vendor=Amazon.com Inc., runtime.version=17.0.6, span.kind=test, test.framework=testng, test.framework_version=7.5, test.module=*, test.name=*, test.parameters={"arguments":{"0":"material"}}, test.source.end=91, test.source.file=*, test.source.start=77, test.status=pass, test.suite=com.warehouse.api.v1.ContentItemsWithNewContentEndpoints, test.traits={"category":["FeatureToggleFlip"]}, test.type=test, test_module_id=5028251759832080046, test_suite_id=4565127199824026067, thread.id=63, thread.name=TestNG-test=ContentItems-1}, duration_ns=0, forceKeep=false

at datadog.trace.civisibility.DDTestImpl.end(DDTestImpl.java:157)
at datadog.trace.civisibility.events.TestEventsHandlerImpl.onTestFinish(TestEventsHandlerImpl.java:300)
at datadog.trace.instrumentation.testng.TracingListener.onTestSuccess(TracingListener.java:103)
at org.testng.internal.TestListenerHelper.runTestListeners(TestListenerHelper.java:103)
at org.testng.internal.invokers.TestInvoker.runTestResultListener(TestInvoker.java:276)
at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:744)
at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
nikita-tkachenko-datadog commented 1 year ago

Hey @EdwardOlmos, thank you for reporting this. There's a couple of questions I'd like to ask in order to understand the scope of the problem better.

Do you happen to have the full java.lang.IllegalStateException: Active scope does not correspond to the finished test ... message? The one you pasted appears to be truncated.

Does the problem manifest itself only with parameterised tests, or does it happen to regular test cases too?

Thanks!

nikita-tkachenko-datadog commented 1 year ago

And on top of that a couple of questions with regards to your project's TestNG XML config:

Do I understand correctly that there are multiple <test/> tags, each including a set of test cases?

If so, is it possible that a test case exists, that is included in more than one <test/> tags? In other words, is there a possibility that two or more tests running in parallel might be executing the same test case?

nikita-tkachenko-datadog commented 1 year ago

@EdwardOlmos, my current hypothesis is there is a test case in your project that gets included into multiple <test/> configurations.

I have fixed the tracer to properly handle such scenarios. The fix is included in release 1.18.1.

Please let me know if the issue persists when the new tracer version is used. Thanks!

wesrice commented 1 year ago

👋 I was experiencing this issue earlier today and came here to report it. I upgraded to the latest release, and I no longer experience the issue.

Thanks so much! Cheers.

EdwardOlmos commented 1 year ago

The issue persists for me after upgrading to 1.18.1. I updated my first comment’s TestNG suite and combined the error message code block since it was one message originally.

I noticed that I don’t get this error when I removed the TestNG parallelization on methods attribute and thread count for the tags. The tests with DataProviders are the ones having this issue if I do parallelize methods.

nikita-tkachenko-datadog commented 1 year ago

@EdwardOlmos, I am having troubles reproducing it locally with 1.18.1.

Could you please check if it's possible to get the full exception message, it looks like the one you provided was truncated. I expect the full message to look like Active scope does not correspond to the finished test, it is possible that end() was called multiple times or an operation that was started by the test is still in progress; active scope span is: ...; expected span is: ....

Also, would it be possible to see your Gradle build output?

EdwardOlmos commented 1 year ago

Here's another test with the full exception message. Another thing to mention is that we are changing the TestName for tests with DataProviders by appending For{ProviderClass} on the test name using the @BeforeMethod TestNG Annotation.

Classname: java.lang.IllegalStateException
Message: Active scope does not correspond to the finished test, it is possible that end() was called multiple times or an operation that was started by the test is still in progress; active scope span is: DDSpan [ t_id=2517750863621473666, s_id=2711050245010811015, p_id=0 ] trace=warehouseapitest/testng.test/com.warehouse.api.GDataDownloadWithNewContentEndpoints.shouldGetLtThumbnailOnEntityWithNewContentEndpoints tags={_dd.profiling.enabled=0, _dd.trace_span_attribute_schema=0, _sample_rate=1, ci.workspace_path=*/warehouseplatform, component=testng, env=local, git.branch=develop, git.commit.author.date=2023-07-12T22:15:47.000Z, git.commit.author.email=*, git.commit.author.name=*, git.commit.committer.date=2023-07-12T22:15:47.000Z, git.commit.committer.email=*, git.commit.committer.name=*, git.commit.message=*
, git.commit.sha=*, git.repository_url=*, language=jvm, os.architecture=amd64, os.platform=Linux, os.version=4.14.311-233.529.amzn2.x86_64, process_id=21783, runtime-id=e0846919-bafa-4c8f-9fb6-6beb9082307f, runtime.name=OpenJDK Runtime Environment, runtime.vendor=Amazon.com Inc., runtime.version=17.0.6, span.kind=test, test.framework=testng, test.framework_version=7.5, test.module=*/warehouseapitest, test.name=shouldGetLtThumbnailOnEntityWithNewContentEndpoints, test.source.end=71, test.source.file=*/warehouseapitest/src/main/java/com/warehouse/api/GDataDownloadWithNewContentEndpoints.java, test.source.start=64, test.status=pass, test.suite=com.warehouse.api.GDataDownloadWithNewContentEndpoints, test.traits={"category":["FeatureToggleFlip"]}, test.type=test, test_module_id=22721666963994446, test_suite_id=4538224138476274292, thread.id=68, thread.name=TestNG-test=GDataDownloads-3}, duration_ns=0, forceKeep=false; expected span is: DDSpan [ t_id=8031503532708521376, s_id=7619286721755720373, p_id=0 ] trace=warehouseapitest/testng.test/com.warehouse.api.GDataDownloadWithNewContentEndpoints.shouldGetStThumbnailOnEntityWithNewContentEndpoints tags={_dd.profiling.enabled=0, _dd.trace_span_attribute_schema=0, _sample_rate=1, ci.workspace_path=*/warehouseplatform, component=testng, env=local, git.branch=develop, git.commit.author.date=2023-07-12T22:15:47.000Z, git.commit.author.email=*, git.commit.author.name=*, git.commit.committer.date=2023-07-12T22:15:47.000Z, git.commit.committer.email=*, git.commit.committer.name=*, git.commit.message=*
, git.commit.sha=*, git.repository_url=*, language=jvm, os.architecture=amd64, os.platform=Linux, os.version=4.14.311-233.529.amzn2.x86_64, process_id=21783, runtime-id=e0846919-bafa-4c8f-9fb6-6beb9082307f, runtime.name=OpenJDK Runtime Environment, runtime.vendor=Amazon.com Inc., runtime.version=17.0.6, span.kind=test, test.framework=testng, test.framework_version=7.5, test.module=*/warehouseapitest, test.name=shouldGetStThumbnailOnEntityWithNewContentEndpoints, test.source.end=61, test.source.file=*/warehouseapitest/src/main/java/com/warehouse/api/GDataDownloadWithNewContentEndpoints.java, test.source.start=54, test.status=pass, test.suite=com.warehouse.api.GDataDownloadWithNewContentEndpoints, test.traits={"category":["FeatureToggleFlip"]}, test.type=test, test_module_id=22721666963994446, test_suite_id=4538224138476274292, thread.id=69, thread.name=TestNG-test=GDataDownloads-4}, duration_ns=0, forceKeep=false
Stacktrace:
java.lang.IllegalStateException: Active scope does not correspond to the finished test, it is possible that end() was called multiple times or an operation that was started by the test is still in progress; active scope span is: DDSpan [ t_id=2517750863621473666, s_id=2711050245010811015, p_id=0 ] trace=warehouseapitest/testng.test/com.warehouse.api.GDataDownloadWithNewContentEndpoints.shouldGetLtThumbnailOnEntityWithNewContentEndpoints tags={_dd.profiling.enabled=0, _dd.trace_span_attribute_schema=0, _sample_rate=1, ci.workspace_path=*/warehouseplatform, component=testng, env=local, git.branch=develop, git.commit.author.date=2023-07-12T22:15:47.000Z, git.commit.author.email=*, git.commit.author.name=*, git.commit.committer.date=2023-07-12T22:15:47.000Z, git.commit.committer.email=*, git.commit.committer.name=*, git.commit.message=*
, git.commit.sha=*, git.repository_url=*, language=jvm, os.architecture=amd64, os.platform=Linux, os.version=4.14.311-233.529.amzn2.x86_64, process_id=21783, runtime-id=e0846919-bafa-4c8f-9fb6-6beb9082307f, runtime.name=OpenJDK Runtime Environment, runtime.vendor=Amazon.com Inc., runtime.version=17.0.6, span.kind=test, test.framework=testng, test.framework_version=7.5, test.module=*/warehouseapitest, test.name=shouldGetLtThumbnailOnEntityWithNewContentEndpoints, test.source.end=71, test.source.file=*/warehouseapitest/src/main/java/com/warehouse/api/GDataDownloadWithNewContentEndpoints.java, test.source.start=64, test.status=pass, test.suite=com.warehouse.api.GDataDownloadWithNewContentEndpoints, test.traits={"category":["FeatureToggleFlip"]}, test.type=test, test_module_id=22721666963994446, test_suite_id=4538224138476274292, thread.id=68, thread.name=TestNG-test=GDataDownloads-3}, duration_ns=0, forceKeep=false; expected span is: DDSpan [ t_id=8031503532708521376, s_id=7619286721755720373, p_id=0 ] trace=warehouseapitest/testng.test/com.warehouse.api.GDataDownloadWithNewContentEndpoints.shouldGetStThumbnailOnEntityWithNewContentEndpoints tags={_dd.profiling.enabled=0, _dd.trace_span_attribute_schema=0, _sample_rate=1, ci.workspace_path=*/warehouseplatform, component=testng, env=local, git.branch=develop, git.commit.author.date=2023-07-12T22:15:47.000Z, git.commit.author.email=*, git.commit.author.name=*, git.commit.committer.date=2023-07-12T22:15:47.000Z, git.commit.committer.email=*, git.commit.committer.name=*, git.commit.message=*
, git.commit.sha=*, git.repository_url=*, language=jvm, os.architecture=amd64, os.platform=Linux, os.version=4.14.311-233.529.amzn2.x86_64, process_id=21783, runtime-id=e0846919-bafa-4c8f-9fb6-6beb9082307f, runtime.name=OpenJDK Runtime Environment, runtime.vendor=Amazon.com Inc., runtime.version=17.0.6, span.kind=test, test.framework=testng, test.framework_version=7.5, test.module=*/warehouseapitest, test.name=shouldGetStThumbnailOnEntityWithNewContentEndpoints, test.source.end=61, test.source.file=*/warehouseapitest/src/main/java/com/warehouse/api/GDataDownloadWithNewContentEndpoints.java, test.source.start=54, test.status=pass, test.suite=com.warehouse.api.GDataDownloadWithNewContentEndpoints, test.traits={"category":["FeatureToggleFlip"]}, test.type=test, test_module_id=22721666963994446, test_suite_id=4538224138476274292, thread.id=69, thread.name=TestNG-test=GDataDownloads-4}, duration_ns=0, forceKeep=false
at datadog.trace.civisibility.DDTestImpl.end(DDTestImpl.java:157)
at datadog.trace.civisibility.events.TestEventsHandlerImpl.onTestFinish(TestEventsHandlerImpl.java:310)
at datadog.trace.instrumentation.testng.TracingListener.onTestSuccess(TracingListener.java:113)
at org.testng.internal.TestListenerHelper.runTestListeners(TestListenerHelper.java:103)
at org.testng.internal.invokers.TestInvoker.runTestResultListener(TestInvoker.java:276)
at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:744)
at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
nikita-tkachenko-datadog commented 1 year ago

we are changing the TestName for tests with DataProviders by appending For{ProviderClass} on the test name using the @BeforeMethod TestNG Annotation

Could you please elaborate on how exactly that is done? Do I understand correctly, that your test class implements org.testng.ITest and its getTestName method to supply a custom name for your test cases? What would be the implementation of getTestName in that case?

EdwardOlmos commented 1 year ago

Yea, sure, here's how I'm doing it.

@BeforeMethod(alwaysRun = true)
protected void setUpMethod(Method method, Object[] params) throws Exception {
  // Restore to default method name first.
  testName = method.getName();

  // Implements dynamic renaming of parameterized tests. Only look at Test annotations with a
  // DataProvider specified.
  // If we got our parameters from a DataProvider class, then ask it for the test name.
  // If we got our parameters from a DataProvider method, then call toString() on the first
  // parameter.
  Test annotation = method.getAnnotation(Test.class);
  if (annotation != null) {
    Class<?> clazz = annotation.dataProviderClass();
    if (DataProvider.class.isAssignableFrom(clazz)) {
      testName =
          ((DataProvider) clazz.newInstance()).getTestName(params, method.getName());
    } else if (!StringUtils.isEmpty(annotation.dataProvider()) && params.length > 0) {
      Object firstParam = params[0];
      if (firstParam instanceof DataProvider) {
        testName = ((DataProvider) firstParam).getTestName(params, method.getName());
      } else {
        testName = testName + StringUtils.capitalize(firstParam.toString());
      }
    }
  }
}
nikita-tkachenko-datadog commented 1 year ago

Thanks a lot for those details. I believe I have identified and fixed the problem in the tracer now. The fix is included in release 1.18.2.

Please let me know if the issue persists when the new tracer version is used. Thanks!

EdwardOlmos commented 1 year ago

It works with the new tracer version! Thank you so much for looking into this @nikita-tkachenko-datadog!