ThrowTheSwitch / Ceedling

Ruby-based unit testing and build system for C projects
http://throwtheswitch.org
Other
597 stars 246 forks source link

Random Failures on same Tests #915

Closed awiegel closed 2 weeks ago

awiegel commented 3 months ago

We have three different ceedling projects that run perfectly fine without errors. However, sometimes they fail (randomly).

We executed the tests over 1000 times, and around 1% of them failed.

Some of the errors were:

Occurred on the pre-release gem ceedling-0.32.0-2f246f1.

ceedling version

Ceedling => 0.32.0
CMock => 2.5.4
Unity => 2.6.0
CException => 1.3.4

Does anyone else also noticed their projects failing randomly?

deltalejo commented 3 months ago

Do you find the same behavior on latest pre-release version?

mvandervoord commented 3 months ago

@awiegel -- In the project section of your project.yml file, there are likely two settings:

  :test_threads: 8
  :compile_threads: 8

If you set these to 1, do you still get the failures?

Similarly, if you run your tests without the gcov plugin, does it still produce failures?

These symptoms sound like the result of your file system not keeping up with the new threading. I had seen this on Windows with earlier pre-release versions. I haven't seen them with the latest releases, but that doesn't mean there aren't some creeping issues still to be found.

I apologize that you've run into this and hope we can uncover the source!

mkarlesky commented 3 months ago

@awiegel First of all, thank you so much for hammering prereleases so hard! The 1% failure rate certainly sounds like classic nondeterministic behavior such as with threading.

At least one other community member has been using prerelease builds for large, complex, multi-platform test suite builds successfully. They were finding threading bugs early on (reported not through Github). Since then we had thought we had found and fixed those problems. Perhaps not! The build you are referencing is months more recent than all that work.

To pile on some other thoughts / questions:

As Mark suggested, please do let us know if cranking down threading to single threads makes a difference. That said, it sounds like it is a non-trivial thing to simply re-run thousands of builds with a changed configuration.

awiegel commented 3 months ago

Thank you for the quick responses!

I've tested different things now:

If it helps, the tests run on a docker ubuntu container which is executed on a windows pc.

Unfortunately, I cannot share any project data.

Letme commented 3 months ago

If threading 1 solves the runtime problem then you have a bad setup/teardown for the tests as it means some shared memory is overwritten by each other. So your "virtualization" is not done correctly (you didn't write what CPU and stuff, so we cant really point to a better direction) and I would look into your general memory layout for the problems.

mkarlesky commented 3 months ago

@awiegel Well, we're learning something here. I'm trying to think of what to ask since confidentiality is a hurdle here.

Could you explain the new deterministic failures with the latest prerelease you mentioned?

mkarlesky commented 3 months ago

@awiegel A little progress update… Some of what you reported caused me to think about changes in how test runners are generated. And, in fact, the prerelease version you first referenced is only a week or two older than those changes. Threading behavior is hard, as we all know. I think I see some gaps in thread safety those runner generation changes may have opened. It's hard to say if what I have in mind is your problem, but I do think there's an opportunity to fortify some data structure threading protections. It may simply be that not enough people have used recent prerelease versions of Ceedling as intensely and with your specific configuration to have run into the same issue you are.

mkarlesky commented 3 months ago

@awiegel The latest prerelease has additional threading protection. I am not sure what to expect. On the one hand I can't see any code paths that would have tripped on the lack of thread protection I just added. On the other hand, circumstantial evidence and my gut says what I changed may be the source of your inconsistent builds. Only time and your own testing will tell.

awiegel commented 2 months ago

@mkarlesky A little testing update from my side. With the latest prerelease (1.0.0-3d9cd04), I still get the same errors.

mkarlesky commented 2 months ago

@awiegel Thank you for the followup. We've run some stress testing and have not yet triggered the problem. We're retooling to run better multi-threaded stress testing now. I know you are not able to share your code. However, could you share anything at all about your project and about the failing test? Are you using a lot of mocks? No mocks? Is your test suite exercising a great deal of memory operations? Do you have large test files with many, many test cases? Any complicated macros or conditional compilation scenarios? What is your build rigging (e.g. Jenkins, CircleCI, Github Action, etc.)? Are you capturing logs directly from Ceedling or capturing your Ceedling $stdout output as a log using your build system? Could you share an anonymized version of the failing test case? Anything that stands out to you as unique about your project might help us.

mkarlesky commented 1 month ago

@awiegel We will soon have a new prerelease build that we believe fixes the issues you were experiencing. When it's ready, if you are able, would you be in a place to try it with your project? These problems are quite difficult to reproduce. The most reliable means of knowing the fixes have worked is running them with known failing projects.

mkarlesky commented 1 month ago

@awiegel If you are able, please give this latest Ceedling 1.0.0 prerelease a try. We believe it fixes the issues you reported.

awiegel commented 3 weeks ago

@mkarlesky Thank you for working on the fixes! Unfortunately, the first two errors with undefined reference and file not found are still present. The random assert failure seems to be fixed, but because it is so rare, I cannot tell for sure.

However, the tests now take around 3 times longer, which is the same behavior when setting threads to 1.

Tested with pre-release 1.0.0-af4f1ad.

mkarlesky commented 2 weeks ago

@awiegel Unfortunately, we are at a loss on where to go from here. Since you're not able to share your project there's little more troubleshooting we can think of to try. We have not seen your remaining errors in any of our testing, nor has anyone else reported these.

The best course of action is, sadly, to close this issue, release 1.0.0, and await more bug reports from others who are in a place to share more details.

Thank you for submitting this issue and hanging with us. What you could share, combined with details in other reports, did help us find and fix a critical and tricky threading problem. If you think of any more details you can share, please reopen this issue.