DataDog / datadog-ci

Use Datadog from your CI.
https://datadoghq.com
Apache License 2.0
129 stars 55 forks source link

JUnit Uploads use longest Test (testcase) time for suite duration instead of given JUnit testsuite time #1407

Open petems opened 3 months ago

petems commented 3 months ago

Bug description

Currently, there is no way to get an accurate "Wall Time" or full test-suite time from any JUnit test upload at the full suite level. It uses the longest testcase time, and assumes all testcases are run in parralel.

As far as I know, there is no way to list testcases in JUnit XML as sequential, so using the longest test time will always be inaccurate.

This is mentioned in a caveat in the docs as "Total Test Time is Different Than Expected"

The total test time is different than expected

How total time is calculated

The total time is defined as the sum of the maximum test session durations.

  1. The maximum duration of a test session grouped by the test session fingerprint is calculated.
  2. The maximum test session durations are summed.

However, a better default behaviour (IMHO) would be to use given testsuite time for suite duration. I wouldn't fix the issues in the flamegraph and showing the tests in the right sequential order, but it would make things like the duration graphs more accurate, which help with features like the Github PR Commenter bot showing regressions/improvements on overall suite time.

Describe what you expected

When I upload a JUnit report I Expect the suite time to match the testsuite time Instead the suite time is whatever the longest test (testcase) time is

Steps to reproduce the issue

Take any JUnit XML test report, here's an example generated with PyTest:

<?xml version="1.0" encoding="utf-8"?>
<testsuites>
    <testsuite name="pytest" errors="0" failures="1" skipped="0" tests="7" time="5.465"
        timestamp="2024-08-06T03:01:44.563882+01:00" hostname="COMP-YCY2KJ6MWN">
        <testcase classname="tests.accesability_test.TestAccessibility"
            name="test_accessibility_default_counts[chromium]" time="2.441">
            <failure
                message="AssertionError: Found accessibility violations exceeding the maximum allowed: {'moderate': 3}">self
                = &lt;tests.accesability_test.TestAccessibility object at 0x104672000&gt;,
                axe_playwright = &lt;utilities.axe_helper.AxeHelper object at 0x1056e5b20&gt;, page
                = &lt;Page url='https://www.saucedemo.com/'&gt;

                @allure.title("Test Accessibility with Default Counts")
                def test_accessibility_default_counts(self, axe_playwright, page):
                &gt; axe_playwright.check_accessibility(page)

                tests/accesability_test.py:8:
                _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
                _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
                _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

                self = &lt;utilities.axe_helper.AxeHelper object at 0x1056e5b20&gt;, page = &lt;Page
                url='https://www.saucedemo.com/'&gt;, maximum_allowed_violations_by_impact =
                {'critical': 0, 'minor': 0, 'moderate': 0, 'serious': 0}

                def check_accessibility(
                self, page: Page, maximum_allowed_violations_by_impact: Dict[str, int] = None
                ) -&gt; None:
                """Checks accessibility of the page using playwright axe.

                :param page: Playwright Page object
                :param maximum_allowed_violations_by_impact: A dictionary
                specifying the maximum allowed violations for each impact
                level (e.g., {'minor': 2, 'moderate': 1, 'serious': 0,
                'critical': 0}). If None, no violations will be allowed for
                any impact level.
                """
                if maximum_allowed_violations_by_impact is None:
                maximum_allowed_violations_by_impact = {
                "minor": 0,
                "moderate": 0,
                "serious": 0,
                "critical": 0,
                }
                results = self.axe.run(page)
                violations_count = dict(
                Counter(
                [violation["impact"] for violation in results.response["violations"]]
                )
                )
                if violations_exceeded := {
                impact_level: violation_count
                for impact_level, violation_count in violations_count.items()
                if violation_count
                &gt; maximum_allowed_violations_by_impact.get(impact_level, 0)
                }:
                allure.attach(
                body=json.dumps(results.response["violations"], indent=4),
                name="Accessibility Violation Results",
                attachment_type=allure.attachment_type.JSON,
                )
                &gt; assert not violations_exceeded, (
                f"Found accessibility violations exceeding the maximum allowed: "
                f"{violations_exceeded}"
                )
                E AssertionError: Found accessibility violations exceeding the maximum allowed:
                {'moderate': 3}

                utilities/axe_helper.py:51: AssertionError</failure>
        </testcase>
        <testcase classname="tests.accesability_test.TestAccessibility"
            name="test_accessibility_custom_counts[chromium]" time="0.433" />
        <testcase classname="tests.checkout_test.TestCheckout"
            name="test_checkout_counter[chromium-User.STANDARD_USER]" time="0.623" />
        <testcase classname="tests.inventory_test.TestInventory"
            name="test_inventory_page[chromium-User.STANDARD_USER]" time="0.423" />
        <testcase classname="tests.login_test.TestLogin" name="test_valid_login[chromium]"
            time="0.566" />
        <testcase classname="tests.login_test.TestLogin"
            name="test_login_error[chromium-invalid_password]" time="0.432" />
        <testcase classname="tests.login_test.TestLogin"
            name="test_login_error[chromium-locked_user]" time="0.523" />
    </testsuite>
</testsuites>

Expected: I should see the test suite as having the length 5.465

Actual: Suite time is listed as 2.441, the slowest indivudal test speed.

Workaround: You can make a dashboard widget that makes a SUM of tests into the overall accurate test suite time:

image

Additional context

No response

Command

junit

szymon-halik commented 2 months ago

This is exactly what happening for our testsuites that run synchronously once we deploy to Salesforce envs. We are really looking into the tools that will help us to reduce tech debt and introduce quality solutions among multiple Salesforce instances. We saw DataDog as a provider that might help us to achieve it but as we go on the way and discover more bugs/ unprepared solutions for us we are strongly considering to drop DataDog in favor of smaller provider better suited for SF.

Here is what we receive from SF CLI on deployment:

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
    <testsuite name="force.apex" timestamp="2024-08-30T18:12:57.698Z" hostname="https://someinstance--qa.sandbox.my.salesforce.com" tests="1603" failures="0"  errors="0"  time="3178.06">
        <properties>
            <property name="commandTime" value="0.00 s"/>
            <property name="failRate" value="0.00%"/>
            <property name="failing" value="0"/>
            <property name="hostname" value="https://someinstance--qa.sandbox.my.salesforce.com"/>
            <property name="orgId" value="00DXXX"/>
            <property name="passRate" value="100%"/>
            <property name="passing" value="1603"/>
            <property name="skipped" value="0"/>
            <property name="testExecutionTime" value="3178.06 s"/>
            <property name="testStartTime" value="Fri Aug 30 2024 6:12:57 PM"/>
            <property name="testTotalTime" value="3178.06 s"/>
            <property name="testsRan" value="1603"/>
            <property name="username" value="xxx.cicd@abc.com"/>
        </properties>
        <testcase name="callingServiceReturnsCorrelationId" classname="ServiceImpTest" time="15.92">
       </testcase>
               <testcase name="callingServiceReturnsAccountId" classname="AnotherServiceImpTest" time="1.13">
       </testcase>
    </testsuite>
</testsuites>

DataDog PR report looks like that: Screenshot 2024-09-01 at 18 30 18

This is so confusing for us as well as for management, of course you can make a dashboard but just in time access to simple deployment data in PR would make our lives with DataDog much accountable.

@petems what are the chances to prio this for the incoming month or two?

juan-fernandez commented 1 month ago

hey! Sorry for the slow response. We're talking about this internally. Once we have a response we'll let you know πŸ˜„

szymon-halik commented 1 month ago

No problem at all. I am aware of the product roadmap planning pitfalls. Here are a few valuable points in regards to Salesforce ecosystem and dev ops awareness/limits:

  1. Salesforce deployments, package installations always run in sequential mode . It’s just a nature of Salesforce Metadata API https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_testing_best_practices.htm (https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_testing_best_practices.htm) (see important section)
  2. I would say that 90% of the Salesforce ecosystem runs deployments against long living environments(sandboxes), so this investment from DataDog would cover most of the market
  3. There are people that are using short living environments(scratch orgs) to run their test suites using Tooling API in a parallel mode which produces correct Junit xml file, so they can use default DataDog behavior - it's minority
  4. By adding –-sequential flag DataDog would cover the whole landscape of Salesforce ecosystem – ISVs- usually they use more enginerring patters to recreate their orgs so here you would see point 4., partners and customers – majority of the ecosytem - those that just bought Salesforce license for internal use or people helping them to maitain it – those will use Metadata API for deployments which produces incorrect Junit xml
ManuelPalenzuelaDD commented 1 month ago

Hi @petems @szymon-halik ! We've released a change to how we interpret JUnitXML reports to take into account the time field reported by <testsuite> and <testsuites>. The behavior is the following:

  1. Use <testsuite> time field for Datadog test suite duration
  2. Use <testsuites> time field for Datadog test module and test session duration

Thank you for suggesting this, and I hope this makes your experience with JUnitXML reports in Datadog better! πŸ™‡

szymon-halik commented 1 month ago

@ManuelPalenzuelaDD @juan-fernandez that works like a charm! Thank you for delivering that Image Image