emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.76k stars 3.3k forks source link

EM_WORKAROUND_WIN7_BAD_ERRORLEVEL_BUG may be required on Windows versions other than 7 #20583

Open antmjones opened 12 months ago

antmjones commented 12 months ago

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.12 (38d1292ba2f5b4a7c8518931f5ae6f97ef0f6827)

Failing command line in full: Not a compile or link-time failure.

Full link command and output with -v appended: Not a compile or link-time failure.

Description of the issue:

I can semi-reliably reproduce what appears to be the issue mentioned in emcc.bat (see https://github.com/emscripten-core/emscripten/blob/main/emcc.bat#L57) whereby cmd.exe returns a non-zero exit code to the parent process even though Python has returned zero. However, I am running Windows 10 not Windows 7.

I am using emcc as supplied by Microsoft as part of the toolchain for compiling a "Blazor" client-side application. Emcc is being called by msbuild which checks for an exit code of zero.

This does not occur for me 100% of the time, and I have been unable to create a simple reproduction of the issue - even compiling a subset of my large project (rather than the entirety) seems to prevent the issue from occurring. I do however have both a ProcMon trace and numerous msbuild logs confirming that cmd.exe exits with a status of 255 (the code always appears to be 255 for me), despite the fact that manually adding a debug line to the emcc.bat file showed that %ERRORLEVEL% was zero immediately before exit, i.e. I added the following line before @exit /b %ERRORLEVEL%:

@echo MUTE_STDIN Error level is: %ERRORLEVEL%

And it consistently printed MUTE_STDIN Error level is: 0.

I have multiple log files demonstrating that this prints zero even though the cmd.exe process then returns 255.

The other curious issue that I have noted, is that when this issue is triggered, the lines following @%CMD% %* < NUL in the batch file are echoed to stdout even though they are proceeded by an "@" symbol - which possibly suggests that cmd.exe has somehow entered some kind of corrupted state.

Setting EM_WORKAROUND_WIN7_BAD_ERRORLEVEL_BUG to 1 seems to (so far at least!) prevent the issue from occurring for me.

I am aware that this is not an emcc bug, but almost certainly an issue with cmd.exe, however it appears that Microsoft are not maintaining cmd.exe, but if nothing else perhaps the documentation and/or the comment in the batch files could be updated to make it clear that this is not an issue that is specific to Windows 7.

Let me know if any further details are required.

RReverser commented 12 months ago

I wonder if https://github.com/emscripten-core/emscripten/pull/20416 would help you here more permanently. Can you try to copy-paste emcc.ps1 from the PR to your Emscripten checkout locally and see if it works around the issue?

antmjones commented 12 months ago

Thanks for the swift response and the suggestion. I'm pretty confident that using emcc.ps1 would fix the problem (because I've now managed to better isolate the issue, see below), however I'll have to submit a bug report to Microsoft to update their Blazor tooling since emcc.bat is called from the depths of the Microsoft supplied scripts. For now I'm happy that the workaround of setting EM_WORKAROUND_WIN7_BAD_ERRORLEVEL_BUG should work for me. Really my motivation for submitting a bug report was just to hopefully save someone else from wasting hours on this like I have!

Anyway, having done a bit more thinking and experimentation, I have managed to consistently reproduce the cmd.exe bug as follows:

As soon as the ... < NUL line completes, the proceeding lines in the batch file are echoed to stdout (regardless of the @ prefix), and if exit /b ... is used the error code is always 255. Omitting the /b allows the batch file to exit with a zero error code, confirming that EM_WORKAROUND_WIN7_BAD_ERRORLEVEL_BUG should suffice for me.

So, in summary, it looks like using ... < NUL as a workaround for the Python issue is risky if the emcc.bat file is being called from something that does redirection of the input and output streams (somewhat ironically given stdin redirection is the issue the < NUL was meant to be working around!). I'm not sure if having stdin closed is a strict requirement for the issue to be triggered (given it seems unlikely to me that msbuild would deliberately close stdin).

I'll publish my repro code publically and add an additional comment here to link to it in case it is useful to anyone else, and I will also submit a bug report to Microsoft. I think it's incredibly unlikely that they'll fix anything in cmd.exe given their previous comments, but they may at least update their Blazor tooling with a workaround of some kind (whether using EM_WORKAROUND_WIN7_BAD_ERRORLEVEL_BUG or something else).

For what it's worth I completely support the idea that the best long term fix here might be to use Powershell in favour of batch scripting, because it appears cmd.exe is an unsupported mess!

antmjones commented 12 months ago

Reproduction code here:

https://github.com/antmjones/CmdExe255Test