dotnet / arcade

Tools that provide common build infrastructure for multiple .NET Foundation projects.
MIT License
672 stars 347 forks source link

Missing / mangled log files under Azure DevOps 'Attachment' section #9865

Closed ulisesh closed 2 years ago

ulisesh commented 2 years ago

Reported by @hoyosjs

Log files used to be available in attachment section (like Pipelines - Run 20220621.9 (azure.com) but you can't see them after Pipelines - Run 20220621.7 (azure.com))

ChadNedzlek commented 2 years ago

Given that the build that is referenced is pointing to a commit messing with the STDOUT of tests, and we haven't changed anything in this area in a long time (certainly not in the time frame of this issue), it's because the repository stopped writing output.

I'm going to see if I can find a test in a different repository that has output to verify.

hoyosjs commented 2 years ago

This one is a private helix run (it's testing internal, these tests are not things that we can open source since they have product dependencies). We haven't changed infra at all and logs still get registered locally.

ChadNedzlek commented 2 years ago

The commit referenced by the bad build, https://dev.azure.com/dnceng/internal/_git/dotnet-diagnostictests/commit/0e930ad0039f56a4c991a9a4717ff1bed99d1871, includes a bunch of changes to STDOUT.

I'm running https://github.com/dotnet/arcade/pull/9898 through, which adds a failing test with output to see if it's getting included.

ChadNedzlek commented 2 years ago

It does look like arcade is missing it on "Trx.Fail1"... so we definitely broke something here... a month ago. Taking a look.

ChadNedzlek commented 2 years ago

This looks to be a problem with AzDO? The site says it's a 9K file. But when I download it, it's empty. Also, I don't see any of our code mangling the name to "TestResult_{GUID}" anywhere. I'm going to open an IcM

ChadNedzlek commented 2 years ago

I also don't see any changes near this time frame in either piece of our code that could account for this...

ChadNedzlek commented 2 years ago

Yup, this is definitely a Azure DevOps change. I made a raw test attachment upload, and it's not returning the contents. It's both messing with the name (prepending the TestResult{GUID} to the name), and failing to save the contents, though the "size" reported is correct. I'm opening an IcM

missymessa commented 2 years ago

@ChadNedzlek is this going to affect the test reporting functionality, too?

ChadNedzlek commented 2 years ago

I don't think so. It seems to only affect test attachments. The run attachments are presumably fine (otherwise we'd be getting "zero" passed tests, which we don't appear to be). It's been broken since 6/21, so we'd have noticed by now if that were the case.

ChadNedzlek commented 2 years ago

The incident has been acknowledged by Azure DevOps, and they will be rolling a fix out in their next rollout. I've asked for a specific date.

ChadNedzlek commented 2 years ago

Current estimate for when this change will be live is approximately 2 weeks. So we'll keep an eye out to make sure everything looks good EOD 6/22.

ilyas1974 commented 2 years ago

@ChadNedzlek is there an update on this issue?

ChadNedzlek commented 2 years ago

I meant to write 7/22 in my last update (since 6/22 is a month in the past). Until they ship the fix for this then, we are just waiting.

ilyas1974 commented 2 years ago

@ChadNedzlek Just wanted to follow up and see if Azure DevOps was able to resolve this issue for us as promised?

hoyosjs commented 2 years ago

We still don't see any attachments (now we see garbage instead of a blank attachment).

ChadNedzlek commented 2 years ago

It's base64 encoded of the original data. So sounds like we need another IcM.

ChadNedzlek commented 2 years ago

I moved this out of tracking and unassigned myself, since we're going to need another IcM to fix the new broken behavior.

ChadNedzlek commented 2 years ago

Or we can reactivate the old one? The name is still messed up, after all.

MattGal commented 2 years ago

I am opening a fresh IcM.

MattGal commented 2 years ago

First IcM: https://portal.microsofticm.com/imp/v3/incidents/details/318952542/home New IcM: https://portal.microsofticm.com/imp/v3/incidents/details/323269195/home

MattGal commented 2 years ago

TFS Item from first IcM: https://dev.azure.com/mseng/AzureDevOps/_workitems/edit/1967818

MattGal commented 2 years ago

Update: A new fix is supposedly deployed and should roll out by end of week.

MattGal commented 2 years ago

Doesn't seem to work retroactively (i.e. the original repro still looks broken) but checking out results from today, the file name and preview contents are correct, so I can finally close this.

Example

Image

garath commented 2 years ago

Reopening. This fix was rolled-back over the weekend due to dotnet/arcade#10358.

ICM 318952542 has been reactivated.

garath commented 2 years ago

ICM Update: Fix expected in M209, which, if I read the patterns correctly, will start rolling out next week.

garath commented 2 years ago

Fix is rolled out. Test team will complete a retrospective.