Random Failures on same Tests

awiegel commented 3 months ago

We have three different ceedling projects that run perfectly fine without errors. However, sometimes they fail (randomly).

We executed the tests over 1000 times, and around 1% of them failed.

Some of the errors were:

EXCEPTION: ShellExecutionException ==> 'Default Gcov Linker' (gcc) terminated with exit code [1] and output >> "/usr/bin/ld: ... undefined reference to ... -> the file with the definition is 100% referenced in the project.yml.
EXCEPTION: CeedlingException ==> Found no file 'abc.c' in search paths. However, a filename having different capitalization was found: '.../.../.../.../.../abc.c'. -> there is only one file with this name in the whole project.
A test assert fails and gives a different value than usual.

Occurred on the pre-release gem ceedling-0.32.0-2f246f1.

ceedling version

Ceedling => 0.32.0
CMock => 2.5.4
Unity => 2.6.0
CException => 1.3.4

Does anyone else also noticed their projects failing randomly?

deltalejo commented 3 months ago

Do you find the same behavior on latest pre-release version?

mvandervoord commented 3 months ago

@awiegel -- In the project section of your project.yml file, there are likely two settings:

  :test_threads: 8
  :compile_threads: 8

If you set these to 1, do you still get the failures?

Similarly, if you run your tests without the gcov plugin, does it still produce failures?

These symptoms sound like the result of your file system not keeping up with the new threading. I had seen this on Windows with earlier pre-release versions. I haven't seen them with the latest releases, but that doesn't mean there aren't some creeping issues still to be found.

I apologize that you've run into this and hope we can uncover the source!

mkarlesky commented 3 months ago

@awiegel First of all, thank you so much for hammering prereleases so hard! The 1% failure rate certainly sounds like classic nondeterministic behavior such as with threading.

At least one other community member has been using prerelease builds for large, complex, multi-platform test suite builds successfully. They were finding threading bugs early on (reported not through Github). Since then we had thought we had found and fixed those problems. Perhaps not! The build you are referencing is months more recent than all that work.

To pile on some other thoughts / questions:

The test assertion failure stands out as an especially curious problem to me. That sort of failure is typically associated with memory dereferencing issues. It might be a subtle bug in CMock or Unity. It might be that your test is comparing two values and was historically lucky on how memory references shook out. Maybe the updated tools are now disturbing memory layouts on rare occasion. Would you be able to share any of the source and test code around that problem? Feel free to anonymize it as appropriate.
The filename capitalization error is also curious. I actually wonder if that's not the problem at all, but something else that is triggering that validation logic and reporting. That area within Ceedling's code is complicated. Would you be able to share any further details, code, or configuration snippets related to this problem?
I'm quite unsure of what to think about the missing reference failure. The only request I have there is any more project details, snippets, etc. you can share?

As Mark suggested, please do let us know if cranking down threading to single threads makes a difference. That said, it sounds like it is a non-trivial thing to simply re-run thousands of builds with a changed configuration.

awiegel commented 3 months ago

Thank you for the quick responses!

I've tested different things now:

Using the latest pre-release gem produces the same errors. Also, it introduced some deterministic failures, which I have to investigate further.
Replacing gcov with test produces the same errors.
Setting threading to 1 indeed fixes all three problems! However, without threading, the tests take around 3 times longer. Right now the best solution for us is to just retry the tests if such a nondeterministic failure occurs.

If it helps, the tests run on a docker ubuntu container which is executed on a windows pc.

Unfortunately, I cannot share any project data.

Letme commented 3 months ago

If threading 1 solves the runtime problem then you have a bad setup/teardown for the tests as it means some shared memory is overwritten by each other. So your "virtualization" is not done correctly (you didn't write what CPU and stuff, so we cant really point to a better direction) and I would look into your general memory layout for the problems.

mkarlesky commented 3 months ago

@awiegel Well, we're learning something here. I'm trying to think of what to ask since confidentiality is a hurdle here.

Could you explain the new deterministic failures with the latest prerelease you mentioned?

mkarlesky commented 3 months ago

@awiegel A little progress update… Some of what you reported caused me to think about changes in how test runners are generated. And, in fact, the prerelease version you first referenced is only a week or two older than those changes. Threading behavior is hard, as we all know. I think I see some gaps in thread safety those runner generation changes may have opened. It's hard to say if what I have in mind is your problem, but I do think there's an opportunity to fortify some data structure threading protections. It may simply be that not enough people have used recent prerelease versions of Ceedling as intensely and with your specific configuration to have run into the same issue you are.

mkarlesky commented 3 months ago

@awiegel The latest prerelease has additional threading protection. I am not sure what to expect. On the one hand I can't see any code paths that would have tripped on the lack of thread protection I just added. On the other hand, circumstantial evidence and my gut says what I changed may be the source of your inconsistent builds. Only time and your own testing will tell.

awiegel commented 2 months ago

@mkarlesky A little testing update from my side. With the latest prerelease (1.0.0-3d9cd04), I still get the same errors.

The random compilation failures disappear when I set :compile_threads: to 1 (:test_threads: don't seem to affect the failures, so it works on :auto). So I guess there is still some error in the multithreading compilation process. Because I don't see how a bad test setup could provoke such random failures.
The random test assert failure occurs too rarely to really test it (1 out of 1000 times when running the whole test suite). Could be some bad test setup. At least it's the same test with the same assert that fails. However when I only execute this specific test, it never fails.

mkarlesky commented 2 months ago

@awiegel Thank you for the followup. We've run some stress testing and have not yet triggered the problem. We're retooling to run better multi-threaded stress testing now. I know you are not able to share your code. However, could you share anything at all about your project and about the failing test? Are you using a lot of mocks? No mocks? Is your test suite exercising a great deal of memory operations? Do you have large test files with many, many test cases? Any complicated macros or conditional compilation scenarios? What is your build rigging (e.g. Jenkins, CircleCI, Github Action, etc.)? Are you capturing logs directly from Ceedling or capturing your Ceedling $stdout output as a log using your build system? Could you share an anonymized version of the failing test case? Anything that stands out to you as unique about your project might help us.

mkarlesky commented 1 month ago

@awiegel We will soon have a new prerelease build that we believe fixes the issues you were experiencing. When it's ready, if you are able, would you be in a place to try it with your project? These problems are quite difficult to reproduce. The most reliable means of knowing the fixes have worked is running them with known failing projects.

mkarlesky commented 1 month ago

@awiegel If you are able, please give this latest Ceedling 1.0.0 prerelease a try. We believe it fixes the issues you reported.

awiegel commented 3 weeks ago

@mkarlesky Thank you for working on the fixes! Unfortunately, the first two errors with undefined reference and file not found are still present. The random assert failure seems to be fixed, but because it is so rare, I cannot tell for sure.

However, the tests now take around 3 times longer, which is the same behavior when setting threads to 1.

Tested with pre-release 1.0.0-af4f1ad.

mkarlesky commented 2 weeks ago

@awiegel Unfortunately, we are at a loss on where to go from here. Since you're not able to share your project there's little more troubleshooting we can think of to try. We have not seen your remaining errors in any of our testing, nor has anyone else reported these.

The best course of action is, sadly, to close this issue, release 1.0.0, and await more bug reports from others who are in a place to share more details.

Thank you for submitting this issue and hanging with us. What you could share, combined with details in other reports, did help us find and fix a critical and tricky threading problem. If you think of any more details you can share, please reopen this issue.

ThrowTheSwitch / Ceedling

Random Failures on same Tests #915