Segfaults in Windows 2022 MSVC CI job

Dobiasd commented 1 month ago

https://github.com/Dobiasd/FunctionalPlus/actions/runs/9435686090/job/25989531257

97% tests passed, 11 tests failed out of 423
Total Test time (real) =   2.88 sec
The following tests FAILED:
      2 - benchmark_example (SEGFAULT)
      3 - benchmark_input_output_args (SEGFAULT)
    307 - queue_test - full (SEGFAULT)
    364 - side_effects_test - execute_parallelly (SEGFAULT)
    368 - side_effects_test - execute_parallelly_atomic (SEGFAULT)
    369 - side_effects_test - for_each (serial and parallel (SEGFAULT)
    384 - transform_test - transform (SEGFAULT)
    385 - transform_test - reduce (SEGFAULT)
    386 - transform_test - keep_if_parallelly (SEGFAULT)
    389 - transform_test - transform_reduce_parallelly (SEGFAULT)
    390 - transform_test - transform_reduce_1_parallelly (SEGFAULT)

Looks like something thread-related.

pthom commented 1 month ago

Hello Dobias,

I tried to tackle this issue. Unfortunately I failed, and I cannot study it more in detail because I'm stuck on other projects.

However, I have some details which might be of interest to you.

I cannot reproduce this on my computer, using Visual Studio 2022 Preview for ARM

The test will pass without any issue if I compile for x64 (I have an issue when compiling for ARM64, see below)

Inconsistencies in the CI:

On Windows: I'm a bit confused about this error because it is a bit inconsistent: I swear I saw the CI work, then fail for no good reason.
on macOS: the CI works 70% of the times, but can fail: see this run for example.

Inconsistencies when compiling for Windows ARM:

on Windows ARM, running ci_build.sh will fail with the following message:

99% tests passed, 1 tests failed out of 423

Total Test time (real) =   4.63 sec

The following tests FAILED:
         52 - container_common_test - interweave (Failed)
Errors while running CTest

pascal@PASCALTHOMEA90C MINGW64 /f/dvp/OpenSource/FunctionalPlus (fix_win_ci_segfault)
$ ./script/ci_build.sh

when trying to diagnose this on my side, I see that it works perfectly in debug mode, But I can reproduce the issue when using "RelWithDebInfo"

I'm not sure this is related to the seg fault you're experiencing but let me show you what I saw:

TEST_CASE("container_common_test - interweave")
{
    using namespace fplus;
    REQUIRE_EQ(interweave(IntVector({ 1, 3 }), IntVector({ 2, 4 })), IntVector({ 1, 2, 3, 4 }));
    REQUIRE_EQ(interweave(IntVector({ 1, 3, 5, 7 }), IntVector({ 2, 4 })), IntVector({ 1, 2, 3, 4, 5, 7 }));

    // This test fails in release mode (but will work in Debug mode), when compiling for windows ARM
    //REQUIRE_EQ(unweave(IntVector({ 0, 1, 2, 3 })), std::make_pair(IntVector({ 0, 2 }), IntVector({ 1, 3 })));   

    // Let's show the result
    auto u = unweave(IntVector({ 0, 1, 2, 3 }));
    std::cout << fplus::show(u) << std::endl;     // will show ([1, 3], [0, 2]) instead of ([0, 2], [1, 3])

    //REQUIRE_EQ(unweave(IntVector({ 0, 1, 2, 3, 4 })), std::make_pair(IntVector({ 0, 2, 4 }), IntVector({ 1, 3 })));
}

I don't know if this is related.

PS: I might contact you by email in the next days because I'm working hard on a new project which could be interesting for researchers. I'm looking for feedback, and I felt you might have an interesting feedback about it (if you have time)

Dobiasd commented 1 month ago

Wow, thanks a lot for the investigation! :heart:

The flakiness of the macOS CI should be fixed by this commit, which I just pushed. :white_check_mark:

The problem with the Windows tests, however, is still a riddle to me. :monocle_face:

And, sure, looking forward to your email. :+1:

pthom commented 1 month ago

The interweave issue is probably unrelated, and due to a compiler bug.

See https://stackoverflow.com/questions/78608330/postfix-increment-compiler-bug-on-visual-studio-2022-arm-preview

You should probably move the ++ increment at the end of the loop.

pthom commented 1 month ago

You are probably fighting against the limitation of the GitHub runners. Maybe some limitations in threading.

I just rented a visual studio machine on Azure to test it. With Visual Studio Professional 2022 17.10 on x86 64bits / Azure, the tests do run fine.

Dobiasd commented 1 month ago

Thanks a lot for the interweave fix! :heart_eyes:

Regarding the Windows tests: If they cause so much trouble and additional complexity, maybe we should just drop them completely? Edit: Meh, I now remember there were quite a few problems with MSVC, which required code changes in fplus. Not having test coverage for those would also not be too nice. : /

Dobiasd commented 1 month ago

On a test branch, I reproduced the segfault with a minimal example:

TEST_CASE("show_versions - std_thread")
{
    auto t = std::thread([]() { });
    t.join();
}

TEST_CASE("show_versions - std_async")
{
    auto handle = std::async(std::launch::async, []() { return 1; });
    handle.get();
}

Windows 2022 MSVC result:

...
2/3 Test #2: show_versions - std_thread .......   Passed    0.17 sec
3/3 Test #3: show_versions - std_async ........***Exception: SegFault  0.29 sec

pthom commented 1 month ago

On a test branch, I reproduced the segfault with a minimal example:

Wow, well done!

The next step to be able to gather more feedback from other users would be to set up a very simple repository. It would contain:

#include ...
int main()
{
    auto handle = std::async(std::launch::async, []() { return 1; });
    handle.get();
}

and it would compile and run this in CI. What do you think? I think I will try it (or you can if you are impatient to see the result: I cannot do it before this afternoon).

pthom commented 1 month ago

The advantage of this version is that if it does not fail, we could then add doctest, and see if the combination doctest + thread is the cause of the issue

Dobiasd commented 1 month ago

Good idea, thanks, and no hurry. ☺️

pthom commented 1 month ago

Ok, it can be reproduced with the simplest possible async code!

See https://github.com/pthom/study_ci_thread_fault

And https://stackoverflow.com/questions/78612314/github-runners-issue-with-visual-studio-2022-segmentation-fault-with-the-simple

pthom commented 1 month ago

Look at the comments on this SO question:

https://stackoverflow.com/questions/78612314/github-runners-issue-with-visual-studio-2022-segmentation-fault-with-the-simple

I think the bug was fixed - under my feet - by GitHub while I was reporting it to Stack Overflow. You may want to rerun the CI, it worked on my side

I am not lucky these days with compiler and platform bugs 🙃

Dobiasd commented 1 month ago

Cool minimal example! Perhaps your great report made somebody fix the bug (silently).

Dobiasd commented 1 month ago

Ah, no, it seems the work on the other issue happened before.

Yeah, not lucky indeed. :grimacing: We would have solved the problem too by simply doing nothing for a few days. :sloth:

Dobiasd commented 1 month ago

Closing this issue, since the CI runner is fixed. :heavy_check_mark:

Thanks again for the nice collaboration. Situations like this (especially with long-term contributors like you) are what make maintaining an open-source project worthwhile for me. :heart:

Dobiasd / FunctionalPlus

Segfaults in Windows 2022 MSVC CI job #304