Closed Willenbrink closed 1 year ago
I have a VERY simple PR going, 58431, that changes 2 files in one line each, neither of which are code either - just player-facing UI. It's also failing on GCC 9, Curses, LTO. Unless these lines are being called somewhere else, I don't think they should be causing issues like that.
It might be simplest PR you can get that's also causing the failure, which might be useful for finding out what the issue is.
Cool, for some reason I can't edit my issue anymore as Github complains about changes to the text during editing. Anyway, thanks for the info. That will hopefully be useful. It seems that all failures are related to maps but share no other similarities. Perhaps this is some sort of map corruption.
I'm unfortunately a bit busy for the next few days but if someone wants to take a look, I would start with the test failure below as that might be easier to reproduce than the segfaults.
Last function | Run |
---|---|
map::add_vehicle |
https://github.com/CleverRaven/Cataclysm-DDA/runs/6890574747 |
MapgenRemovePartHandler::add_item_or_charges |
https://github.com/CleverRaven/Cataclysm-DDA/runs/6929376451 |
Test failure | https://github.com/CleverRaven/Cataclysm-DDA/runs/6917109222 |
Looks like the test failure might be unrelated, it is being fixed in #58442.
I don't think this is the same thing. #58442 adresses a recurring consistency check failure in structure creation, the issue you report here is less routine.
It might be worth mentioning here that I am also experiencing segmentation faults in explosion tests in the binary compiled with GCC 11.2 with LTO enabled.
@BrettDong Can you provide the exact commands that lead to the segfault? I just tried to reproduce it but couldn't with GCC 12.1.1 and TILES=1 SOUND=1 LTO=1. I've got a new CPU now and am ready for some recompiling to pin down the issue. I don't think it's related to the specific version of GCC as it also occurs with Clang.
We also see some random crashes in the GCC LTO CI tests on GitHub Actions recently, see discussions in #59148.
@BrettDong Can you provide the exact commands that lead to the segfault? I just tried to reproduce it but couldn't with GCC 12.1.1 and TILES=1 SOUND=1 LTO=1. I've got a new CPU now and am ready for some recompiling to pin down the issue. I don't think it's related to the specific version of GCC as it also occurs with Clang.
It is not a deterministic crash. I got one crash in like every 20-50 runs.
Hmm, okay. That's quite rare. I will try it a few more times.
In this comment @BrettDong points out one failure that had the message
free(): invalid next size (fast)
which indicates heap corruption. If there is a heap corruption bug that would explain why we see random nondeterministic failures in various places, because heap corruptions can manifest in many bizarre ways.
Usually, the best way to investigate heap corruptionis to compile with AddressSanitizer (ASan). We do have ASan builds in CI. Have any of the above issues been on ASan builds?
Yes, this one, this two, this three. Unless I'm misunderstanding you? I've considered approaching this with the rr-debugger but haven't looked into ASan yet.
Yes, this one, this two, this three.
I believe all three of those are examples of the bug I fixed in #59141, so they are unrelated to the crashing bugs seen in some of these other examples. I was hoping for something that had segfaulted under ASan.
Yes, this one, this two, this three. Unless I'm misunderstanding you? I've considered approaching this with the rr-debugger but haven't looked into ASan yet.
These are vehicle placement bugs, not heap corruption errors we are discussing.
Ah, got you. No, I haven't noticed any heap corruption with ASan.
https://github.com/CleverRaven/Cataclysm-DDA/runs/7335244668?check_suite_focus=true#step:16:671 This might be of interest here.
Here are two Windows builds failing with bad allocation:
I assume, these are related?
Here's a successful build, but with the TEST_CASE( "overmap_terrain_coverage", "[overmap][slow]" )
(the very same testcase that crashed with bad allocation on the Windows builds I've linked) running extremely slow (\~11 Minutes): https://github.com/CleverRaven/Cataclysm-DDA/runs/7568231915#step:16:610
Note, that I'm not blaming the test-case itself, but that might help narrowing down the culprit.
@Stadler76: Most likely related to #55104 then
@Stadler76: Most likely related to #55104 then
So it is the test to be blamed? I was suspecting the mapgen function itself. Anyway, thanks for the pointer.
The test uncovered a problem in one of the mapgen functions. I think the problem was just recently fixed, so tomorrow's build hopefully should not exhibit this particular problem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Please do not bump or comment on this issue unless you are actively working on it. Stale issues, and stale issues that are closed are still considered.
Describe the bug
I'm currently running into repeated failures on the CI which I can't reproduce and which occur in different parts of the code. I can't exclude that the failures are caused by changes in my PRs but I don't see any obvious cause as I'm not working with low-level memory management. As it happens in two different PRs I wanted to collect information here on the failures and see if others also run into this issue.
Steps to reproduce
Expected behavior
If the failures were reproducible locally by using the same seed it would also be fine.
Screenshots
No response
Versions and configuration
See the attached links. It seems to happen with multiple different CI actions, at least:
Additional context
MapgenRemovePartHandler::add_item_or_charges
map::furn_set
item::is_corpse()
MapgenRemovePartHandler::add_item_or_charges