Open jackjansen opened 2 months ago
Temporarily disabled the failing tests on Windows, but now the resulting nightly
installer exhibits the same issues.
In a way this is good, because it allows local debugging under the Visual Studio debugger.
Downloaded the failing installer on Beelzebub (Windows 11): works like a charm. So maybe that is why I didn't see the problem with manually-built cwipc, because I've always used windows 11 to build?
I've also starting running the individual install-check apps (from libexec/cwipc
) on Topinambur (win10) under the visual studio debugger.
cwipc_codec_check
fails in the OpenCV DLL initialization routine: it's trying to do something to a mutex and gets a null pointer exception.
cwipc_realsense2_install_check
fails in the realsense2 DLL initialization routine, in what appears to be the same msvcp140.dll
mutex routine, mtx_do_lock()
Owwwww, this is very bad. The root cause appears to be an incompatible change that Microsoft has made to mutex constructors.
Found it in this thread: https://forum.juce.com/t/windows-crash-in-apvts-constructor/62039/13
Here is the Microsoft release note: https://github.com/microsoft/STL/wiki/Changelog#vs-2022-1710
Search for "Fixed mutex's constructor to be constexpr".
Updating MSVC Redist may do the trick: the working machine Beelzebub has 14.40.33810
, the non-working Topinambur has 14.34.31931
.
Investigating a bit further. The GitHub Windows runner has been updated to 14.40.33810 about 3 months ago.
So we should never have had the problem on the GitHub runner in the first place, only on user machines that have an older version installed.
This probably means that one of the third party packages we install has installed a private copy of msvcp140.dll
and we are accidentally picking up that one.
This issue seems to be related: https://github.com/actions/runner-images/issues/10055
I've added a which msvcp140.dll
to my windows action, and it shows
/c/hostedtoolcache/windows/Java_Temurin-Hotspot_jdk/8.0.422-5/x64/bin/msvcp140.dll
This issue seems to be related: https://github.com/actions/runner-images/issues/10055
Will attempt to apply https://github.com/OSGeo/gdal/commit/95d092d2c59961b7580add8d8736434a6c43e587 workaround.
No, I'm barking up the wrong tree. Or at least partially the wrong tree: I've now forcibly removed two "bad" copies of msvcp140.dll
and the correct one is now foremost in $PATH
but still having the issue.
Just realised that the problem only occurs in Python tests on the GitHub runners. And realised that something (MatPlotLib
, I think) includes a slurped version of msvcp140 that it tries to load early.
There is a Matplotlib issue about this: https://github.com/matplotlib/matplotlib/issues/28551
Removing the matplotlib file didn't work.
See https://learn.microsoft.com/en-gb/sysinternals/downloads/procdump for new ideas.
It seems that Matplotlib 3.9.2 will fix the issue (see the issue linked above). Need to check.
There seem to be issues with
cwipc_codec_python_tests
andcwipc_kinect_python_tests
.