golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.39k stars 17.59k forks source link

runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows #49471

Open zx2c4 opened 2 years ago

zx2c4 commented 2 years ago

CL 307372 was committed, which contained a nice change to produce crashdumps under certain circumstances. However, several questions were proposed about when those crashdumps would be automatically uploaded to Microsoft in the context of using Insider Builds. In the absence of good answers to that, the otherwise okay CL was reverted (by me) with CL 362454. This proposal is essentially about reverting the revert, and having an extended discussion about under which circumstances these should be enabled and whether we need a new flag for it, as suggested by @bufflig.

Before this proposal moves to an "under consideration" step, we should get a clearer idea about what we're proposing. I'll defer to @zhizheng052 and @bhiggins on this, as they wrote the original CL.

CC @zhizheng052 @bhiggins @bufflig @ianlancetaylor @alexbrainman @aarzilli @aclements

zx2c4 commented 2 years ago

In the CL discussion, there was some question as to when "optional" diagnostics gets enabled. I was just installing a fresh Windows 11 board a few minutes ago and noticed that this was checked by default:

image

aarzilli commented 2 years ago

It seems to me that if you opt to send you crash dumps to microsoft during install (by not opting out) and then opt into creating a crash dump by setting GOTRACEBACK=crash, then having your crash dump uploaded to microsoft is the expected behavior. Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected. Since this is already gated by two options I don't think adding a third opt-in (through a new flag) is going to make a difference. IMO it's either:

zx2c4 commented 2 years ago

Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected.

This wasn't an insider build; I was installing the Windows 11 final release.

aarzilli commented 2 years ago

Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected.

This wasn't an insider build; I was installing the Windows 11 final release.

That sucks. Still, I don't think it changes the rest of my opinion.

bufflig commented 2 years ago

I assume there's something in the registry that enables the "sending to MS" behavior. I wonder if we could detect that and either:

a) Not enable WER or b) Enable WER but warn the user or c) Simply refuse to run when you have that combination (GOTRACEBACK=crash and uploading is enabled)?

Enabling GOTRACEBACK=crash even when you don't actually want the file on disk is not an unreasonable functionality, and was there before this change (it gives you some additional crash information). To get the added behavior that you upload crashdumps to a third party does not seem expected in those cases. At least a stern warning seems mandated, but I'd actually prefer to disable/revert the change for 1.18 and decide exactly how we want to proceed.

zx2c4 commented 2 years ago

I'd actually prefer to disable/revert the change for 1.18 and decide exactly how we want to proceed.

This has already happened, FYI.

There's also the WER option to pop up a UI asking for consent. I suspect this won't work in services, though.

Also, I wonder if we could ditch this whole WER set of options and just create a minidump ourselves? https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/nf-minidumpapiset-minidumpwritedump

elagergren-spideroak commented 2 years ago

To clarify: this is only when GOTRACEBACK=crash is set, right?

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

zx2c4 commented 2 years ago

From the top post:

Before this proposal moves to an "under consideration" step, we should get a clearer idea about what we're proposing.

For the record, I still don't quite know what we're proposing here.

rsc commented 2 years ago

At least the proposal discussion will get more eyes on it. What are the options? Is it possible to write a crash dump that doesn't get collected and sent off?

rsc commented 2 years ago

It sounds like we are still waiting to find out what exactly is being proposed here. Does anyone know?

rsc commented 2 years ago

We are still waiting to find out exactly what is being proposed here. If we can't figure that out, it seems like this would be a likely decline.

alexbrainman commented 2 years ago

To clarify: this is only when GOTRACEBACK=crash is set, right?

Correct.

It sounds like we are still waiting to find out what exactly is being proposed here. Does anyone know?

I reviewed CL 307372. The change makes Go runtime create process crash dumps that can be used by Delve to debug crashes.

The feature is enabled when you set GOTRACEBACK=crash environment variable.

There are also other things that you need to do to enable this feature. Here are the instructions

https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

If we can't figure that out, it seems like this would be a likely decline.

Hopefully my explanation is good enough for you to reconsider.

I am not planing to use this feature today. But I can see how it can be life saving for me and others when they have nothing but crashing program. And crash dump is not good enough.

Also these crash dumps are standard on Windows. Other vendors use them for exactly the same purpose. So I don't see why Go should not implement features available to other tool developers.

Alex

alexbrainman commented 2 years ago

What I also want to add to my comment above is that this feature was requested nearly 5 years ago in #20498. Brad labeled it as NeedsFix and Ian as help wanted, so assumed it was OK to implement this feature.

Different people tried to implement this feature. But the implementation required quite a bit of effort and manual testing.

What made me put extra effort with CL 307372 (most recent attempt) is that I discovered that Delve can now actually use the dump files generated by this change. Which makes this feature quite more useful to more people.

I understand concerns that Jason raised about his crash dumps sent to Microsoft. But he can use GOTRACEBACK environment variable or some settings from

https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

to stop that.

This feature is what Windows does. If we want crash dumps, we have to accept whatever policies Microsoft enforce onto that feature. We can make it clear in the docs what happens when user sets GOTRACEBACK=crash.

Alex

rsc commented 2 years ago

Hi @alexbrainman, thanks for the information. This sounds like a very useful, important feature for users.

I tried to read https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps and am confused about the implications with sending data to Microsoft.

It says "This feature is not enabled by default." Does that mean that you have to edit a registry setting to make GOTRACEBACK=crash work? Does it mean that MS doesn't collect the crashes by default? Or if not, what exactly is the registry setting that has to be changed to stop uploads of GOTRACEBACK=crash-generated files to MS?

Thanks.

alexbrainman commented 2 years ago

It says "This feature is not enabled by default." Does that mean that you have to edit a registry setting to make GOTRACEBACK=crash work?

GOTRACEBACK=crash will still output appropriate stack trace. But GOTRACEBACK=crash will not create crash dump files until you edit your registry. That is how I read Microsoft doco. And that is what I see on my PC.

Does it mean that MS doesn't collect the crashes by default?

I don't believe so. But don't take my word for it. Reading

https://docs.microsoft.com/en-us/windows/win32/wer/using-wer

it talks about Windows OS crashes - see WER sends the report to Microsoft (Watson Server) if the user consented. bit. But we are talking about our program crash dumps. That would be too wasteful for Microsoft to collect all our program dumps.

what exactly is the registry setting that has to be changed to stop uploads of GOTRACEBACK=crash-generated files to MS?

I looked at my friend's PC (with registry untouched by hackers like me), and it does have HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting key, but there is no LocalDumps key inside. See https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps for details.

This registry config does not create any crash dumps regardless of GOTRACEBACK=crash setting.

If you have any Windows PC, you can check its registry yourself. You can use regedit program, if you have GUI access to your PC (direct console or Remote Desktop).

Alternatively you can write Go program to list that registry key. If you start with internal/syscall/windows/registry.TestReadSubKeyNames and adjust couple of lines, you should be able to run that test on Go builders to see what their regstry settings are. If you care.

Alex

beoran commented 2 years ago

@zx2c4 's suggestion to create a minidump should be considered also as a more privacy focused alternative.

rsc commented 2 years ago

It sounds like if we make this change, then developers who have not edited the registry will not see any change at all. And developers who did edit the registry will get exactly what those edits imply. That sounds OK too. So it sounds like we should do this?

Does anyone object to accepting this proposal?

alexbrainman commented 2 years ago

@zx2c4 's suggestion to create a minidump should be considered also as a more privacy focused alternative.

I don't know what you mean by "minidump". But I tested CL 307372 with DumpType registry value set to 2. I also tried dumps created with DumpType registry value of 1, but these are not recognised by Delve.

You can see CL 307372 commit message explaining DumpType values that we tried. Just like I explained it here.

Alex

aarzilli commented 2 years ago

@alexbrainman @beoran It seems to me that there is some confusion brewing here around the term "minidump" stemming from microsoft decision to use "minidump" and "mini dump" to mean two different things:

@zx2c4 was suggesting above to call MiniDumpWriteDump from dbghelp.dll directly, instead of letting WER do it. Either way a minidump is produced and either way the result can be a mini dump or a full dump.

AFAIK mini dumps are recognized by Delve but there is nothing useful in them (because of how goroutines are implemented), which is why it might seem that they aren't.

zx2c4 commented 2 years ago

It sounds like if we make this change, then developers who have not edited the registry will not see any change at all. And developers who did edit the registry will get exactly what those edits imply. That sounds OK too. So it sounds like we should do this?

Does anyone object to accepting this proposal?

Didn't we learn at some point that Windows beta users (called "insiders") get this registry key turned on?

zx2c4 commented 2 years ago

@zx2c4 was suggesting above to call MiniDumpWriteDump from dbghelp.dll directly, instead of letting WER do it. Either way a minidump is produced and either way the result can be a mini dump or a full dump.

Indeed this still seems like the best solution.

aarzilli commented 2 years ago

Indeed this still seems like the best solution.

In theory it makes no difference, if microsoft wants to upload your core dumps they can hook MiniDumpWriteDump just like they can hook WER. The consent they ask is broad. In practice I don't know if there is any difference, has anyone checked that WER actually uploads anything by default and MiniDumpWriteDump doesn't?

zx2c4 commented 2 years ago

WER means we hand the state of a crashing app to the WER API and say "do something with it." As an app developer, I can actually register to receive those dumps on Microsoft Partner Portal. With MiniDumpWriteDump, the third param is a file handle, which means we're in control. I mean sure, Microsoft could shim this and intercept everything; it could do all this behind the scenes too without any API call. But that seems a bit much tinfoil right? That'd be a Big Deal if they were doing that, whereas WER is kind of already marked as part of Microsoft's "crash into this blackbox" apparatus.

rsc commented 2 years ago

@aarzilli Thanks for the tutorial on "minidumps" vs "mini dumps". You said that "mini dumps" are recognized by Delve but basically useless because they don't have memory contents. What about "minidumps" as generated by MiniDumpWriteDump? Do they work with Delve?

If the WER API and MiniDumpWriteDump both write files that work with Delve, and if we have uncertainty about the WER API and uploads to MS but have no uncertainty about MiniDumpWriteDump, then it seems like the choice is clear: use MiniDumpWriteDump.

But are both of those if statements true?

aarzilli commented 2 years ago

What about "minidumps" as generated by MiniDumpWriteDump? Do they work with Delve?

As far as I can tell procdump.exe uses MiniDumpWriteDump to write its minidumps and Delve works with those.

rsc commented 2 years ago

OK, well if MiniDumpWriteDump works with Delve, then my previous complex conditional reduces to "the choice is clear: use MiniDumpWriteDump".

Does anyone object to using MiniDumpWriteDump instead of the WER API?

rsc commented 2 years ago

Retitled to what I believe we converged on. Any objections?

zx2c4 commented 2 years ago

Seems okay to me.

rsc commented 2 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 2 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

iglendd commented 2 years ago

Thank you guys for considering this parity (with Linux) implementation. I would like to humbly mention a few hopefully relevant items.

Sorry for a large wish list :) but may be you can accommodate or at least consider some of it :)

qmuntal commented 1 year ago

These are all good comments @iglendd, I suggest you submit a separate proposal so we can discuss without polluting this thread.

Heads up: I'm implementing this proposal in the form it has been approved. Won't be able to submit it before go1.20 freeze though.

gopherbot commented 1 year ago

Change https://go.dev/cl/458955 mentions this issue: runtime: write minidump when crashing on windows

qmuntal commented 1 year ago

⚠️ After testing CL 458955, using it for a while and sharing it internally, I would not recommend merging it, even more, I would step back and rethink this proposal.

The downsides of the current proposal have already been presented by @iglendd in his previous comment (without much attention from myself and others):

This makes me think that creating a minidump by just setting GOTRACEBACK=crash is not enough, and even unexpected. Generating a minidump should be more intentional and we should provide the knobs to tweak the generation process.

Then you have WER. WER solves all the previous issues, and more, without adding any burden on our side. IMO, we would provide more value if we focus on deciding how to enable WER rather than implementing an alternate minidump experience.

CL 458955 is still there in case my arguments haven't convinced you.

iglendd commented 1 year ago

@qmuntal thank you for attempting to address this long standing issue. I hope you do not mind me offering a few additional comments.

qmuntal commented 1 year ago

@iglendd I've submitted the proposal #57441, please take a look to see if it resonates with your idea