MicrosoftPremier / VstsExtensions

Documentation and issue tracking for Microsoft Premier Services Visual Studio Team Services Extensions
MIT License
59 stars 14 forks source link

Check Build Quality Intermittently Fails #37

Closed eheath closed 4 years ago

eheath commented 5 years ago

The Check Build Quality task randomly fails. It doesn't wait until all of the code coverage files have been written, and fails the build believing that the code coverage does not meet the defined threshold.

woehrl01 commented 5 years ago

Hi @eheath,

thank you very much for reporting. This is a known issue. We are currently waiting on a specific API change, which should be released "soon", so we can grab the code coverage in a more reliable way.

So this issue should be fixed in one of the next versions.

Until this is fixed, you can adapt the following variables of the BQC task to suit your needs:

eheath commented 5 years ago

Hi @woehrl01, thank you for the response.

If it helps at all, I've tried several different workarounds using different combinations of these variables, including setting PSGer.Cover.StableDataCount up to "10" and we still run into the issue. I suppose I could bump it even higher.

ReneSchumacher commented 5 years ago

Hi Eric,

as Lukas mentioned, there is currently no way to deterministically wait for the coverge data merge that happens asynchronously in the background. I recommend increasing the overall wait time to a high value (default is actually 600000, i.e., 10 minutes) like 1200000, increasing the poll interval to something like 15 seconds (i.e., 15000), and increasing the stable data count to 5 or 6.

If you have several coverage data sets and the merge job is slow, just increasing the stable data count does not allow for enough wait time.

Also, I do hope that the new backend rollout is finished soon. This will generally speed up the merge process and provides a stable way to wait for the process to finish. I'll check back with the Azure DevOps teams to get more information about the deployment progression and update the issue when I know more.

René

eheath commented 5 years ago

Thanks René. Will there be a notification when the backend rollout is completed, and more information is available?

ReneSchumacher commented 5 years ago

There probably won't be an official statement unless the testing tool team wants to share some background information about the service. The merge process and new log store is nothing users directly interact with so they won't see any visible changes (except for performance improvements).

However, I'll send out notifications to all people who complained about issue and will update this issue as well. Just keep an eye on this issue.

Btw: the deployment seems to be going rather slow. Currently, only three out of 15 scale units have been partially updated so I assume the deployment might take another week or two.

Stratigix commented 5 years ago

We are also waiting for a fix on this, as we cannot reliably add the Build Quality Check to our build process. Our build quality check fails every other build because it cannot reliably retrieve the coverage results. We even up'ed the poll interval to 10 seconds and the stable data count to 20, but still getting fails, and this increases our build time significantly.

ReneSchumacher commented 5 years ago

Hi all,

the testing tools team is finally done with the rollout of the backend changes; thus, we'll be pushing out a new task version today or tomorrow depending on the final testing. I'm going to close this issue now as it should be resolved with the upcoming version. Please let us know if you still hit this issue or any other problem.

Thanks for waiting and happy building! René

Stratigix commented 5 years ago

Hi @ReneSchumacher I updated to version 6.* of the build task, and it looks much better, but we still get failures. I ran 11 builds consecutively and 3 of them failed, not being able to retrieve the code coverage results. All of this I ran with the default settings.

ReneSchumacher commented 5 years ago

Hi @Stratigix

thanks for letting me know. Could you send the log for the Build Quality Checks task to PSGerExtSupport@microsoft.com so I can have a look? In theory, the task should just keep polling forever until it gets the coverage values unless the backend tells us that there is no code coverage. Perhaps there's still an issue in the backend or I missed a particular case in the new task version.

Thanks, René

ReneSchumacher commented 5 years ago

Closing again since the newly reported issue occurred on an on-premises Azure DevOps Server. The new code coverage merge jobs are currently not available for on-prem servers and the task falls back to the old behavior that can be tuned using the variables PSGer.Cover.PollInterval (set the interval between polls for coverage data in milliseconds), PSGer.Cover.MaxWaitTime (set the maximum wait time for the task in milliseconds), PSGer.Cover.StableDataCount (set the number of polls that have to read the same value before the task assumes that code coverage merge has finished).

If you hit issues using the task on-premises, please contact us and we can help tweak the variables to ensure the task works properly for you.

René

tomas-dainovec commented 5 years ago

Hi @ReneSchumacher,

We are still getting this issue even after updating the task to 6.* on cloud Azure DevOps. However we are running our own build agents, could it cause any issues? Are there any build agent updates necessary?

Thanks Thomas

ReneSchumacher commented 5 years ago

Hi @tomas-dainovec,

the private agent should not impact the task in any way as the changes have been in the service backend, the APIs and our task logic. All of those changes should work for private as well as hosted agents.

I assume the issue isn't easily reproducible, or is it? I'd like to check the task log for a failing build but you'd have to run the build with the variables System.Debug and BQC.LogRawData set to true in order to have the necessary information in the log. If you can create such a log file, please send it to our support address PSGerExtSupport@microsoft.com and I'll have a look.

Thanks, René

adammodlin commented 5 years ago

@ReneSchumacher I'm seeing this again. Consistently reproducing every single time for a build I have. If it still the recommended guidance to set the above variables? Feel free to contact me internally on Teams if you want me to provide any logs.

ReneSchumacher commented 4 years ago

Hi all,

I want to give you a very much needed update about this issue (sorry for the long period of silence). There is currently an issue in the coverage backend of Azure DevOps services that leads to the unreliable behavior of the BQC task. The Azure DevOps testing tools team is already working on a fix. As soon as that is deployed, we will double check that it works properly and update this issue again.

As a workaround you can add a PowerShell task right before the BQC task and add a sleep of about 15 seconds. This should give the backend enough time to process the coverage data before BQC reads it from Azure DevOps.

Thank you so much for your patience! René

tomas-dainovec commented 4 years ago

Hi @ReneSchumache are there any updates on this issue? We are still observing it in our pipelines are there any actions we need to take?

ReneSchumacher commented 4 years ago

Thank you for bumping this, @tomas-dainovec! Unfortunately, it has been pretty quiet around this even though I saw an update to our internal ticket that there should have been some fix. I just asked for an update from the Azure DevOps team and will post any new information here as soon as I get it.

René

ReneSchumacher commented 4 years ago

I'm closing this issue for now. If you still see sporadic timeouts, please open a new issue so we can track them. Since the root cause for this issue lies in Azure DevOps Services and not in our task, we cannot really help with the issue. However, if there are enough occurrences of this problem, chances will be higher that the product team might fix the issue once and for all.