Open JulieLeeMSFT opened 5 months ago
Is the crash happening during the build or during tests? Which pipeline in particular are you interested in?
Helix does collect dumps so I'd like to get some details to understand why that wasn't working here. Some docs on how it works may be found here: https://github.com/dotnet/arcade/blob/b4e9225c6c2f9da42fbb611a5e8942a08476fe89/Documentation/Dumps/Dumps.md
This is about AzDO builds, so Helix doesn't help.
Related / partial duplicate: https://github.com/dotnet/dnceng/issues/1290
Would it help at all if we looked into this feature from 1ES? https://eng.ms/docs/cloud-ai-platform/devdiv/one-engineering-system-1es/1es-docs/1es-hosted-azure-devops-pools/hold-machine-for-debugging?
Extracting dumps is one of the scenarios that is specifically called out.
Another one to make sure we consider in triage @ilyas1974 and @garath
Would it help at all if we looked into this feature from 1ES? https://eng.ms/docs/cloud-ai-platform/devdiv/one-engineering-system-1es/1es-docs/1es-hosted-azure-devops-pools/hold-machine-for-debugging?
Extracting dumps is one of the scenarios that is specifically called out.
Note this option sounds costly because it means all build machines get held for a while after a build completes. It sounds like holding a machine only after a failure may eventually be implemented though using the feature may remain expensive even with that.
This a feature that would go well if added to the Arcade SDK.
Recently, we had an intermittent crash in runtime in VMR build for preview 1, but it took multiple days to reproduce the issue and pinpoint what caused the crash. Since there is no infrastructure currently to get crash dumps in AzDO for build pipelines, it was an extremely painful process.
With a complex build such as VMR, it is essential to make diagnosable system and have crash dumps capability in AzDo builds.
It was especially painful to identity the exact VMR commit that introduced the regression. VMR doesn't have a single commit that corresponds to a single commit from the runtime repo. A commit in the VMR represents a batch of commits, one for each repo flowing into installer. So, it was not possible to simply checkout commits in the VMR to identify the specific offending commit in runtime.
cc @markwilkie @agocke @jkotas @mthalman @MichaelSimons @hoyosjs @tommcdon