MicrosoftPremier / VstsExtensions

Documentation and issue tracking for Microsoft Premier Services Visual Studio Team Services Extensions
MIT License
59 stars 14 forks source link

Recently Builds have failed due to the following error #115

Closed RobertWildgoose closed 4 years ago

RobertWildgoose commented 4 years ago

[WARNING] Unable to get code coverage data within the maximum wait time.

[warning]Unable to get code coverage data within the maximum wait time.

Total blocks: 0 Covered blocks: 0 Code Coverage (%): 0 [ERROR] The code coverage value (0%, 0 blocks) is lower than the minimum value (32%)!

[error]The code coverage value (0%, 0 blocks) is lower than the minimum value (32%)!

[error]At least one build quality policy was violated. See Build Quality Checks section for more details.

Finishing: Check build quality

It would seem that the quality gate cannot read the coverage file, but i've looked at the changes to the pipeline and there doesn't seem to be anything changed since around a week before the Quality check stopped working.

Could this be something to do with multiple pipelines?

ReneSchumacher commented 4 years ago

Hi Robert,

I answered your question in the VS Marketplace. As mentioned there, we will need more information about your issue. Could you first check that you can see code coverage data on the build summary page? If that works, we're sure that coverage has been successfully published, otherwise there seems to be a problem with that publishing process.

If you see coverage on the build summary, this issue is most likely caused by slow coverage processing in Azure DevOps. Please add the variable PSGer.Cover.MaxWaitTime to your pipeline and set its value to 1200000. This increases the timeout to 20 minutes. If coverage cannot be read within that time, there must be an issue with the coverage data itself.

René

RobertWildgoose commented 4 years ago

Hey Rene,

Thanks for looking into this,

We can see the coverage data has its own tab and is published, Added the above variable in and running a test now to see if this has resolved.

Rob

RobertWildgoose commented 4 years ago

Hey @ReneSchumacher ,

I added the PSGer.Cover.MaxWaitTime variable in which extended build time to 24 minutes however this eventually failed.

How can we debug what could be wrong with the coverage data?

ReneSchumacher commented 4 years ago

Hi @RobertWildgoose,

thanks for the feedback. Could you run your build again with the variables System.Debug and BQC.LogRawData set to true and then send me the log files from the BQC task to PSGerExtSupport@microsoft.com?

RobertWildgoose commented 4 years ago

@ReneSchumacher - ive sent them over for you, thank you again

jeffrutland commented 4 years ago

so, I recently came across this very same issue. seemingly when our devops instance gained a second project, things started failing. everything in my build otherwise succeeds, and I can see that the coverage is published as part of the build. I added the variables suggested above, and while waiting the 20 minutes configured for the coverage discovery to fail, I realized - the link that is being checked is missing the project name in the repository. in the raw data below, I see this in the log (replacing my organization name with '{organization}'):

##[debug]-------------------- RAW DATA --------------------
##[debug]buildCoverageSummary:
##[debug]{"coverageData":[],"build":{"id":"504","url":"https://dev.azure.com/{organization}/_apis/build/Builds/504"},"deltaBuild":null}
##[debug]------------------ END RAW DATA ------------------

when I hit the URL however, I get a 404. but, when I add my project name to the url path (here as '{project}') then it works.

https://dev.azure.com/{organization}/{project}/_apis/build/Builds/504

also, I see the following relevant variables logged here at the beginning of the pipeline log:

##[debug]System.TeamFoundationCollectionUri=https://dev.azure.com/{organization}/
##[debug]System.CollectionUri=https://dev.azure.com/{organization}/
##[debug]System.TeamProject={project}
##[debug]Build.BuildId=504

so - it would appear to me that the url that's being generated / requested is missing the project name here. how is this url created? my project name has spaces in it - not sure if that is an issue or not for this.

if there's anything else I can supply to help address this, please let me know - this is a critical piece of our build pipeline. thanks.

RobertWildgoose commented 4 years ago

@jeffrutland - I've been working through this with @ReneSchumacher i'm in the same boat - however seems you got further than i did in debugging the issue, in our case we hadn't added anything new it was literally a case of when version 7 got released these issues started occurring?

Are you saying that adding project name into the url fixes the issue, if so @ReneSchumacher is there anyway we can forward this on to the relevant team.

jeffrutland commented 4 years ago

@jeffrutland - I've been working through this with @ReneSchumacher i'm in the same boat - however seems you got further than i did in debugging the issue, in our case we hadn't added anything new it was literally a case of when version 7 got released these issues started occurring?

Are you saying that adding project name into the url fixes the issue, if so @ReneSchumacher is there anyway we can forward this on to the relevant team.

I'm not entirely sure when this started happening in our instance either - but we recently added another project to our devops organization. I'm not sure if that's the root cause here by any means.

but to be clear - yes, I was able to manually construct the link as I mentioned and test it, and I do get the results then. it appears that the project name is missing from the generated url, hence why it isn't found - in my case, anyway. I'd be interested to hear if you try the same thing and are able to retrieve your coverage results as well?

ReneSchumacher commented 4 years ago

Hi guys,

thanks for trying to debug the issue on your own. Unfortunately, the assumption about the coverage URL is not correct. The URL you see in the raw output is in the coverage result, not the request. BQC is using the azure-devops-node-api (see https://github.com/Microsoft/azure-devops-node-api) to query coverage data. This library encapsulates all REST calls so that we don't have to construct the URL and http headers ourselves. The fact that we receive a proper coverage summary object basically proves that we request the correct URL.

I'm not sure why the build URL in the summary information is wrong. That looks like a bug in the API itself, but it doesn't affect BQC.

@RobertWildgoose - if you believe this issue started with v7.x of the task, I assume you have been using v6.x as well? If so, could you just switch back to v6.x and check if this solves the issue? I don't see why it should, as we basically haven't changed the way we read coverage data between v6.x and v7.x. But it's definitely worth a try.

RobertWildgoose commented 4 years ago

Hey Both,

I can confirm that both v7.x and v6.x are both causing this issue on my pipeline. Going through the options we spoke about @ReneSchumacher, it seems its come to no avail.

Any other suggestions?

ReneSchumacher commented 4 years ago

That's what I thought. Since we haven't heard of more issues like this and our internal pipelines with BQC are still all running as expected, this issue must be with Azure DevOps or the coverage data itself. Could you please report the issue at https://developercommunity.visualstudio.com/spaces/8/index.html (choose Report a problem, then Azure DevOps) and send me the link to your problem report? I'll forward it to the product team and try to speed things up.

It would still be good to know how long it actually takes for the data to be accessible. Could you perhaps duplicate pipeline and disable everything that's not needed to repro the issue (e.g., artifact or symbol publishing etc.)? I guess it should be enough to run the compiler, tests and BQC. Then set the variable PSGer.Cover.MaxWaitTime to 2400000 (that's 40 minutes) and increase the overall build timeout to 120 minutes. With this we hopefully see some coverage results. If BQC cannot fetch coverage within 40 minutes, I guess that the trigger mechanism for the coverage merge job is broken. This would only start the merge job when the build finishes and BQC could never work properly.

RobertWildgoose commented 4 years ago

Hey Rene,

Looking into this it seems that the following is grabbed

{"$id":"1","innerException":null,"message":"A potentially dangerous Request.Path value was detected from the client (:).","typeName":"System.Web.HttpException, System.Web","typeKey":"HttpException","errorCode":0,"eventId":0}

ReneSchumacher commented 4 years ago

Hm, where did this come from? Is that an output in our task log?

RobertWildgoose commented 4 years ago

Waiting for code coverage data...

[debug]-------------------- RAW DATA --------------------

[debug]buildCoverageSummary:

[debug]{"coverageData":[],"build":{"id":"2518","url":"https://dev.azure.com/Next-EcommerceApps/_apis/build/Builds/2518"},"deltaBuild":null}

[debug]------------------ END RAW DATA ------------------

https://dev.azure.com/Next-EcommerceApps/_apis/build/Builds/2518 <-- This URL

ReneSchumacher commented 4 years ago

Well, I mentioned that in my post above. The URL that is returned by Azure DevOps in the coverage summary object seems to be wrong. However, that URL is never used by the task. We're using the azure-devops-node-api to call back to Azure DevOps and read the coverage summary (see https://github.com/microsoft/azure-devops-node-api/blob/dcf730b1426fb559d6fe2715223d4a7f3b56ef27/api/TestApi.ts#L900). This basically call the REST API described at https://docs.microsoft.com/en-us/rest/api/azure/devops/test/code%20coverage/get%20build%20code%20coverage?view=azure-devops-rest-6.1. When raw logging is enabled, we simply log the raw JSON output of that REST call. Regardless of the URLs in that output, we keep reading coverage using the library.

I am quite sure that this is a strange timing issue that is related to the coverage merge job. In order to get to the coverage, many steps must have been performed:

  1. Coverage must have been generated by your testing tool (e.g., VSTest).
  2. Coverage data must have been published to Azure DevOps. This is an asynchronous task, so even if the VSTest task is already marked as finished, coverage might still be uploading in the background.
  3. Once coverage data has been uploaded, the coverage merge job must have been triggered. This is something that (imho) hast to be done by the client as well. I know at least one way to upload coverage data without triggering the merge job and this makes BQC useless as it cannot read the data as long as the build is still running.
  4. If the merge job has been triggered, it must finish its job before we'll see data being returned by the REST API.
  5. Finally, if everything before went well, BQC is able to fetch coverage data and evaluate the policy. Unfortunately, everything except for this final step is out of my hands :(
nikrivers commented 4 years ago

Hi, just thought I'd chime in here as I'm having the same issue with all of my org's build pipelines (19 in total). I've not seen an obvious pattern, but what I have seen is that the BQC task times out on around 40% of runs, across all pipelines.

This morning I noticed that manually re-running a pipeline after its first failure will almost always result in a successful run, but I know that a few days ago I was needing to re-run pipelines several times before the BQC task would be successful.

I also saw an instance of a particular pipeline's BQC task failing while the subsequent manual re-run succeeded. This was interesting because the manual run was only minutes after the original failure, and was queued to the same (self-hosted) agent as the failed run.

jeffrutland commented 4 years ago

hey @ReneSchumacher was a problem ticket ever created as you suggested? if so, do you have a link to this? I'm interested in following the progress on this if there is anything that develops. I've played with alternatives the last couple of days, and I'm running out of options here. thanks!

ReneSchumacher commented 4 years ago

@nikrivers Thanks for reporting this. I have forwarded your report to the testing tools team and asked if there's currently a deployment underway that changes the coverage backend. I'll try to push that topic as good as I can.

@jeffrutland I didn't create a problem ticket for this as I'm working directly with the Azure DevOps teams internally. I'm not sure if @RobertWildgoose created a ticket.

RobertWildgoose commented 4 years ago

Apologies,

I didn't get around to raising a ticket @ReneSchumacher.

ReneSchumacher commented 4 years ago

Hi everyone,

we believe we found the issue in the backend and can fix it by disabling a feature flag. If you are affected by the issue, please send us the names of your affected organizations (i.e., https://dev.azure.com/{organization} or https://{organization}.visualstudio.com) to PSGerExtSupport@microsoft.com. We'll let you know when the flag has been turned off for your org(s) and the issue should be resolved.

RobertWildgoose commented 4 years ago

@ReneSchumacher - Thanks again for you help on this.

jeffrutland commented 4 years ago

sending you my org name - thanks, @ReneSchumacher!

jeffrutland commented 4 years ago

just asking because I'm curious - if this indeed is tied to a feature flag that is being disabled, is there functionality that will not be available when this is done? will there be any other effects from having the flag turned off? thanks!

ReneSchumacher commented 4 years ago

Hey Jeff,

the feature flag only affects an optimization to how attachments for test runs are handled. Since this apparently still has issues, we move back to the "old way" of handling attachments, so there shouldn't be any visible or noticeable impact for you.

ReneSchumacher commented 4 years ago

I'm closing this issue, since it is not directly related to Build Quality Checks and the Azure DevOps teams are actively working on a permanent fix for the backend issue. If you encounter similar problems (BQC timing out), please send your organization name(s) to PSGerExtSupport@microsoft.com so we can unblock you again.