MicrosoftPremier / VstsExtensions

Documentation and issue tracking for Microsoft Premier Services Visual Studio Team Services Extensions
MIT License
56 stars 14 forks source link

BQC only reports coverage from first task in build pipeline #124

Closed madebybear closed 3 years ago

madebybear commented 3 years ago

Hi,

We are using Azure DevOps / Repos and basic YAML pipeline tasks.

We are using BCQ multiple times in a single build but for different projects, for example;

 # ... snip - checkout and build solution A from a PR ...

 - task: VSTest@2
    displayName: 'Run Unit Tests For Solution A'
    inputs:
      testAssemblyVer2: '**\Path\To\SolutionA.UnitTests.dll'
      searchFolder: src/
      testFiltercriteria: 'Category=Unit'
      runSettingsFile: src/test.runsettings
      vsTestVersion: toolsInstaller
      codeCoverageEnabled: true
      otherConsoleOptions: '/Platform:$(platform)'
      platform: '$(platform)'
      configuration: '$(configuration)'
      diagnosticsEnabled: true
    enabled: ${{ parameters.unit_tests_enabled }}

  - task: BuildQualityChecks@7
    displayName: ${{ parameters.code_coverage_display_name }}
    inputs:
      runTitle: 'Unit Tests - Solution A'
      checkCoverage: true
      coverageFailOption: fixed
      coverageThreshold: ${{ parameters.code_coverage_threshold }}
      buildConfiguration: $(configuration)
      buildPlatform: $(platform)
      showStatistics: true
    enabled: ${{ parameters.code_coverage_enabled }}

 # ... do other stuff like security checks, certificate signing ...
 # ... snip - build solution B from the same PR ...

 - task: VSTest@2
    displayName: 'Run Unit Tests For Solution B'
    inputs:
      testAssemblyVer2: '**\Path\To\SolutionB.UnitTests.dll'
      searchFolder: src/
      testFiltercriteria: 'Category=Unit'
      runSettingsFile: src/test.runsettings
      vsTestVersion: toolsInstaller
      codeCoverageEnabled: true
      otherConsoleOptions: '/Platform:$(platform)'
      platform: '$(platform)'
      configuration: '$(configuration)'
      diagnosticsEnabled: true
    enabled: ${{ parameters.unit_tests_enabled }}

  - task: BuildQualityChecks@7
    displayName: ${{ parameters.code_coverage_display_name }}
    inputs:
      runTitle: 'Unit Tests - Solution B'
      checkCoverage: true
      coverageFailOption: fixed
      coverageThreshold: ${{ parameters.code_coverage_threshold }}
      buildConfiguration: $(configuration)
      buildPlatform: $(platform)
      showStatistics: true
    enabled: ${{ parameters.code_coverage_enabled }}

When you look at the code coverage output in the pipeline, we only ever see the output for Solution A.

Solution B's code coverage output is identical.

The total blocks, covered blocks and percentage are the same for both Solution A & B.

Starting: Check Build Quality (90%)
==============================================================================
Task         : Build Quality Checks
Description  : Breaks a build based on quality metrics like number of warnings or code coverage.
Version      : 7.6.2
Author       : Microsoft
Help         : [[Docs]](https://github.com/MicrosoftPremier/VstsExtensions/blob/master/BuildQualityChecks/en-US/overview.md)
==============================================================================
SystemVssConnection exists true
Using IdentifierJobResolver
Validating code coverage policy...
Waiting for code coverage data...
Waiting for code coverage data...
Waiting for code coverage data...
Waiting for code coverage data...
Waiting for code coverage data...
Waiting for code coverage data...
Waiting for code coverage data...
Successfully read code coverage data from build.
Evaluating coverage data from 1 filtered code coverage data sets...
Total blocks: 164
Covered blocks: 163
Code Coverage (%): 99.3902
[SUCCESS] Code coverage policy passed with 99.3902% (163/164 blocks).
Finishing: Check Build Quality (90%)

Any pointers at all here are greatly appreciated.

ReneSchumacher commented 3 years ago

Hi @madebybear,

I believe that there are two issues here:

  1. Timing
    BQC reads coverage data from Azure DevOps and does not look directly at any coverage files. Thus, it can only read coverage data if it has been published to and processed by Azure DevOps. Both publishing and processing of code coverage data run asynchronously, though, which sometimes requires BQC to wait for the data. Unfortunately, there currently is a "bug" in the coverage API so that it sometimes returns a status indicating that all coverage data has been processed even though there are still some pending coverage datasets. I believe that this is what's happening in your pipeline.

    Because you publish coverage data twice with some time in between, the BQC task running after your second VSTest task probably gets the status of "all done" and reads only the coverage published by the first VSTest task. You can check that by setting the variables System.Debug and BQC.LogRawData to true and running your pipeline again. You will see the JSON data containing your coverage summary data in the log and in the second BQC's log file you should probably see that it has a status property with the value 2 indicating Complete status. I'm already working with the Azure DevOps product teams to get the status property fixed. In the meantime, the only workaround is to inject some waiting time into your pipeline before you run the BQC task. You could do so by adding a PowerShell task with the following inline script code: Start-Sleep -Seconds 10. This should give the backend enough time to at least start processing the data so BQC can wait for it.

  2. Merging Whenever you publish multiple coverage datasets from the VSTest task, there's a coverage merge job that runs to merge all coverage datasets into one. Unless the datasets have different platforms and/or configurations attached to them, the summary coverage data contains the aggregated values for blocks, lines, etc. coverage. Thus, the first BQC task would only see the coverage for the first test run, while the second BQC task cannot inspect just the coverage from the second test run. Instead it will find the sum of coverage from the first and second test run. This is true for all additional BQC tasks you may have in your pipeline.

    You can "trick" the merge job by assigning arbitrary platform/configuration values to the test tasks. In other words: instead of using the variables $(BuildConfiguration) and $(BuildPlatform) in the VSTest task, simply use the values like first run and second run for one of the values (it doesn't matter if it's configuration or platform. Then use the exact same value in the corresponding BQC task so it only picks up the specific coverage dataset.

madebybear commented 3 years ago

Hi Rene,

Thank you for your detailed answer.

While trying to diagnose the issue, I did previously add System.Debug and BQC.LogRawData to the variables on the pipeline, so I will inspect the logs for that run again.

I will also try your timing issue fix, by adding a PowerShell task with the following inline script code: Start-Sleep -Seconds 10.

I have also split out the original tasks inside a single job to more jobs, e.g.


build:
      jobs:
        - job: build_solution_a
          displayName: Build Solution A
          variables:
            configuration: release
            platform: x64
          steps:
            - template: pipeline/templates/build.prepare-for-build.yml@self
            - template: pipeline/templates/build.solution-a-and-run-tests.yml@self
              parameters:
                unit_tests_enabled: true
                code_coverage_enabled: true
                code_coverage_display_name: Check Build Quality (90%)
                code_coverage_threshold: 90

        - job: build_solution_b
          displayName: Build Solution B
          variables:
            configuration: release
            platform: x64
          steps:
            - template: pipeline/templates/build.prepare-for-build.yml@self
            - template: pipeline/templates/build.solution-b-and-run-tests.yml@self
              parameters:
                unit_tests_enabled: true
                code_coverage_enabled: true
                code_coverage_display_name: Check Build Quality (90%)
                code_coverage_threshold: 90

I have yet to try this though.

I'll keep you informed of what works for me.

madebybear commented 3 years ago

Hi Rene,

Here is a breakdown of what we tried;

Splitting into Jobs

This didn't help us (it maybe a solution others though).

The aim would have been to have one job running one test suite with one build quality check.

In our case we have a PR > CI pipelines that run, then pushes out to various (gated) production environments. We use various YAML templates with parameters to achieve this. Splitting this out in to jobs meant that we lost 'scope' of builds and artifacts. Splitting up our templates to jobs was more work and would have ultimately slowed down our pipelines somewhat.

Adding a Start-Sleep PowerShell Task ❌

This didn't help us. We added a PowerShell task in-between the VSTest@2 task and BuildQualityChecks@7;

   # ... snip VSTest@2 goes here ...

  - task: PowerShell@2
    displayName: 'Wait For Coverage'
    inputs:
      targetType: 'inline'
      script: 'Start-Sleep -Seconds 10'
      errorActionPreference: 'stop'

   # ... snip BuildQualityChecks@7 goes here ...

The effect of this now we no longer see Waiting for code coverage data.... We have left this in for now.

Starting: Check Build Quality (90%)
==============================================================================
Task         : Build Quality Checks
Description  : Breaks a build based on quality metrics like number of warnings or code coverage.
Version      : 7.6.2
Author       : Microsoft
Help         : [[Docs]](https://github.com/MicrosoftPremier/VstsExtensions/blob/master/BuildQualityChecks/en-US/overview.md)
==============================================================================
SystemVssConnection exists true
Using IdentifierJobResolver
Validating code coverage policy...
Successfully read code coverage data from build.
Evaluating coverage data from 1 filtered code coverage data sets...
Total blocks: 164
Covered blocks: 163
Code Coverage (%): 99.3902
[SUCCESS] Code coverage policy passed with 99.3902% (163/164 blocks).
Finishing: Check Build Quality (90%)

Assigning arbitrary platform/configuration ✔

So, it turns out that when we assigned an arbitrary configuration and buildConfiguration that were identical, we saw the desired results.

- task: VSTest@2
    displayName: 'Run Unit Tests For Solution A'
    inputs:
     # ... snip other config ...
      platform: '$(platform)'
      configuration: 'Solution A - Unit Tests'  # this is the same value as below
      #... snip other config ...

  # ... snip PowerShell@2

  - task: BuildQualityChecks@7
    displayName: ${{ parameters.code_coverage_display_name }}
    inputs:
     # ... snip other config ...
      buildPlatform: $(platform)
      buildConfiguration: 'Solution A - Unit Tests' # this is the same value as above
      #... snip other config ...

We now see the correct number of tests runs, the correct blocks and coverage, the correct number of .coverage files available in AzureDevOps and we're now able to set different thresholds for each type of test (unit, integration, smoke).

Questions

Does setting configuration in the VSTest@2 and buildConfiguration in the BuildQualityChecks@7 have any side effects?

To us, this ultimately feels like a bit of a hack since we we would have expected that the configuration & buildConfiguration should have just been both set to release. We realise that when the build task runs (VSBuild@1) for the actual build it's already set to release and the path of testAssemblyVer2 (in VSTest@2) is targeting the bin release version anyway.

Which sort of begs the question, if we can set arbitrary values here, what are configuration & buildConfiguration actually for? And what do they default to? (couldn't find any useful docs on this)

ReneSchumacher commented 3 years ago

Hey,

glad one of my suggestions actually worked :-D

I fully understand the hassle with breaking apart your pipeline into multiple jobs and I was already expecting that this wouldn't be a good solution. Jobs have their own somewhat isolated scope (they run on a new agent) and this makes it much harder to move state between those jobs.

The wait probably didn't help because you already saw the aggregated values in the second BQC task instance. Just to clarify: my assumption was that maybe both test runs would have 50 covered blocks but the second BQC task only read 50 blocks (i.e., just the result from the first run). If it already had 100 blocks it was "correct" in the sense that it saw the aggregated value.

Now to your question: buildConfiguration and buildPlatform are usually defined to select the correct compiler settings for your build. In your project file (assuming you have a csproj or similar project type) there are usually two property groups with conditions like this:

<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
    <DebugSymbols>true</DebugSymbols>
    <DebugType>full</DebugType>
    <Optimize>false</Optimize>
    <OutputPath>bin\Debug\</OutputPath>
    <DefineConstants>DEBUG;TRACE</DefineConstants>
    <ErrorReport>prompt</ErrorReport>
    <WarningLevel>4</WarningLevel>
  </PropertyGroup>
  <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
    <DebugType>pdbonly</DebugType>
    <Optimize>true</Optimize>
    <OutputPath>bin\Release\</OutputPath>
    <DefineConstants>TRACE</DefineConstants>
    <ErrorReport>prompt</ErrorReport>
    <WarningLevel>4</WarningLevel>
  </PropertyGroup>

Those groups are defining a couple compiler settings (e.g., debug symbol creation, optimization, etc.). To make life easier, those settings are tied to "labels" and there's the unwritten rule that we name them Release and Debug and usually add the target platform to the label as well. Thus, buildConfiguration and buildPlatform indirectly define the compiler settings.

While the compile task indeed needs the variables (or at least the correct values for configuration and platform), the test task does not need them. It merely uses both values to label the test result so you can distinguish between multiple test runs, which is what you did with the arbitrary value. There is no other way to prevent the aggregation of multiple coverage datasets, not even from different jobs or stages within the same pipeline, because coverage data is always associated with the overall pipeline and not a specific job/stage. In other words: you are not losing anything by setting arbitrary values.

Hope that helps.

madebybear commented 3 years ago

Thank you again for your detailed answer.

We really appreciate your time, answers and understanding here.