MicrosoftPremier / VstsExtensions

Documentation and issue tracking for Microsoft Premier Services Visual Studio Team Services Extensions
MIT License
56 stars 14 forks source link

Code coverage results waiting until timeout is thrown #236

Closed xaberue closed 2 months ago

xaberue commented 3 months ago

Describe the context

Describe the problem and expected behavior Hi all,

We are facing a very annoying issue, even though it seems quite common having researched on previous issues, but I am not sure if are the exact problem that we are facing.

The point is that we added a very simple mechanism of code coverage through the BuildQualityChecks extension from the marketplace, and despite that this is a marketplace extension and not a buit-in step, it was working perfectly fine in every project that we were running it. However, since yesterday, suddenly this step started failing in every build without having changed anything at all from our side, the step simply gets stuck forever, like if it was waiting for the file in a wrong directory. For sure the dotnet test command is properly generating the .coverage file as in previous runs.

I have initially reported this issue to the DevOps team in the MS Developmet Community here in order to understand if something has been changed from their side that may be affecting the behaviour of the extension.

Knowing that it was perfectly working until now in more than one project and now is crashing we are now a bit blinded about what may be the reasoning: agent problem? configuration / understanding of the extension? another?

I’ll leave here some code examples and info about how are we using it:

Partial part of the dotnet test + coverage check

- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '$(solution)'
    arguments: '--no-restore --no-build --collect "Code Coverage" --settings .runsettings'

- task: BuildQualityChecks@9
  inputs:
    checkCoverage: true
    buildConfiguration: '$(buildConfiguration)'
    buildPlatform: '$(buildPlatform)'
    coverageFailOption: 'fixed'
    coverageType: 'blocks'
    coverageThreshold: '60'

.runsettings

<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
<RunSettings>
    <DataCollectionRunSettings>
        <DataCollectors>
            <DataCollector friendlyName="Code Coverage" uri="datacollector://Microsoft/CodeCoverage/2.0" assemblyQualifiedName="Microsoft.VisualStudio.Coverage.DynamicCoverageDataCollector, Microsoft.VisualStudio.TraceCollector, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
                <Configuration>
                    <CodeCoverage>

<ModulePaths>
                            <Include></Include>
                            <Exclude>                               
                                <ModulePath>healthchecks.azureservicebus.dll</ModulePath>
                                <ModulePath>healthchecks.sqlserver.dll</ModulePath>
                                <ModulePath>healthchecks.ui.client.dll</ModulePath>
                            </Exclude>
                        </ModulePaths>

<Functions>
                            <Exclude>
                                <Function>.*\SqlServer.Context\.. *</Function>
                                <Function>.*\SqlServer.Migrations\.. *</Function>
                            </Exclude>
                        </Functions>

<UseVerifiableInstrumentation>True</UseVerifiableInstrumentation>
                        <AllowLowIntegrityProcesses>True</AllowLowIntegrityProcesses>
                        <CollectFromChildProcesses>True</CollectFromChildProcesses>
                        <CollectAspDotNet>False</CollectAspDotNet>
                        <EnableStaticNativeInstrumentation>True</EnableStaticNativeInstrumentation>
                        <EnableDynamicNativeInstrumentation>True</EnableDynamicNativeInstrumentation>
                        <EnableStaticNativeInstrumentationRestore>True</EnableStaticNativeInstrumentationRestore>
                    </CodeCoverage>
                </Configuration>
            </DataCollector>
        </DataCollectors>
    </DataCollectionRunSettings>
</RunSettings>

Screenshots of a failed run:

Thank you so much for your valuable feedback

ReneSchumacher commented 3 months ago

Hi @xaberue,

thanks for reporting this. My guess is that the issue is indeed related to a change on Azure DevOps side, since our last change to BQC was over two months ago. Timeouts like this are usually caused by three things:

  1. Issue in coverage generation
    We have had reports of timeouts where someone accidentally changed the .runsettings file or broke coverage generation by other code changes.

  2. Slow coverage uplodat/processing
    Sometimes, Azure DevOps or the agent seem to be very slow when uploading coverage data to Azure DevOps or processing it in the background. Since BQC currently only reads preprocessed data from Azure DevOps, we fully rely on Azure DevOps having finished the processing.

  3. Issue with coverage upload
    We also saw cases in which the test task or coverage data publishing task had an issue uploading the data to Azure DevOps. In that case, even though coverage data is created on the agent, it is not visible to Azure DevOps and, thus, not visible through the API.

Issues one and two would be visible in the build summary UI. Please check your build sumary for coverage numbers in the upper right corner. There should be a number for tests and below that a number for coverage. If nothing is displayed there, coverage has not reached Azure DevOps. If you see numbers there, please open the following URI in the browser and send the resulting JSON file to me (via email to PSGerExtSupport@microsoft.com):

https://dev.azure.com/{organization}/{project}/_apis/test/codecoverage?buildId={buildId}

Just replace the values for organization, project, and buildId with the correct value for your build.

If you don't see a number in the build summary, there is an issue with either generating or uploading the coverage data. In that case, could you please send the logs of the DotNetCoreCLI task to me, so I can inspect it.

If you are impacted by the second issue, you would see coverage data in the build summary but still run into issues with BQC. I believe it should be unlikely that you are impacted by this, since you didn't reduce the timeout of BQC and it ran for the full ten minutes. If coverage hasn't been processed after that amount of time, There must be a different issue.

I'll wait for your feedback and then, if necessary, will involve the Azure DevOps PG.

xaberue commented 3 months ago

Hi @ReneSchumacher !!

Thank you so much for your quick assistance, it's really appreciated!

Indeed the results are visible in the faulted execution, here I post a couple of screenshots, I'll share the JSON result as well via email as requested.

image

image

ReneSchumacher commented 3 months ago

Thanks for the update. I will look at the JSON and discuss with the Azure DevOps PG. Uploading and processing the code coverage data shouldn't take more than ten minutes. There has been an issue with the new version of the the Publish Code Coverage Results task that only started processing coverage data after the build had finished. This has been fixed for the PCCR task, but maybe a similar issue has been introduced in the DotNetCoreCLI task. I'll check that.

ReneSchumacher commented 3 months ago

Quick update: the current version of the DotNetCoreCLI task is causing this issue. It is not properly publishing the coverage results, so results are only available after the pipeline has finished. Since BQC must read the data while the pipeline is running, it times out.

Currently, there is only one workaround: switching to the coverlet (x-platform) coverage collector. You can do this by adding the argument --collect:”XPlat Code Coverage” to the arguments of your dotnet test command and use the PublishCodeCoverageResults task to publish the Cobertura coverage file. Here's a sample YAML snippet:

  - task: DotNetCoreCLI@2
    displayName: Build and test with xplat coverage
    inputs:
      command: test
      projects: '**/*.sln'
      arguments: '--collect:"XPlat Code Coverage"'

  - task: PublishCodeCoverageResults@2
    displayName: Publish code coverage
    inputs:
      summaryFileLocation: '$(Agent.TempDirectory)/**/*.cobertura.xml'

Please keep in mind that v2 of the PCCR task only supports line coverage but does support merging multiple coverage files into one. If you need other coverage types (like branches, blocks, etc.), you must use v1 of PCCR. That, however, only supports a single coverage file. I.e., if you have multiple test runs in your pipeline, you must first merge the Cobertura files on the agent (the ReportGenerator tool can do that) and then publish the resulting coverage file.

xaberue commented 3 months ago

Hi @ReneSchumacher ,

Unfortunately, I have tested, and I've found some discrepancies in the results, for instance:

image

image

It's like I'm able to see one kind of coverage in way place, but it's reading a different one in the BQC step (with a very low level, why?).

But, to add more fuel to the fire: aside of that, once tested again, only the first executions "worked", if I try the same solution now, it doesn't work.

image

- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '$(solution)'
    publishTestResults: false
    arguments: '--no-restore --no-build --collect "XPlat Code Coverage" --settings .runsettings'

- task: PublishCodeCoverageResults@2
  displayName: Publish code coverage
  inputs:
    summaryFileLocation: '$(Agent.TempDirectory)/**/*.cobertura.xml'   

- task: BuildQualityChecks@9
  inputs:
    checkCoverage: true
    coverageFailOption: fixed
    coverageType: lines

Result:

image

What could there be at the permissions level or other factors, either in the DevOps configuration or the steps that could alter this?

It is absurd that two different mechanisms, with such a simple implementation on paper, end up failing in the same way. Could the issue be in some permission setting in our configuration that affects this?

Many thanks in advance

xaberue commented 2 months ago

I've finally managed to make it work:

In one hand, after having it running again, I'm not understanding how it worked the first times, let me explain everything:

Regarding the discrepancies on my last comment, it was because the cobertura file was different, and since that project has more than one test project, as @ReneSchumacher commented before, the BQC step is not supporting multiple coverage files, that was the reason for the discrepancy.

Having another project with a single test project, it worked having the PublishCodeCoverageResults task as @ReneSchumacher suggested.

About how to make it work with VSTest runner, the step that I finnally used is PublishTestResults, but I'm still wondering how it worked before without this step. Now the current configuration is the following:

- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '$(solution)'
    arguments: '--no-restore --no-build --collect "Code Coverage" --settings .runsettings'

- task: PublishTestResults@2
  displayName: Publish code coverage
  inputs:
    testRunner: VSTest
    testResultsFiles: '$(Agent.TempDirectory)/**/*.trx'

- task: BuildQualityChecks@9
  inputs:
    checkCoverage: true
    checkWarnings: true
    coverageFailOption: fixed
    coverageType: lines
    coverageThreshold: 60
    warningThreshold: 15

Note also that is important that having publishTestResults by default active in DotNetCoreCLI task, this creates automatically the .trx file and drops it into the temp directory $(Agent.TempDirectory)

So finally is working, thanks for your initial support @ReneSchumacher.

I hope that the case will serve to provide a solution to similar cases, investigate why it worked without the PublishTestResults task (as seen in the initial report), improve some of the current steps and/or current documentation.

Best regards