Closed awiegel closed 2 weeks ago
Do you find the same behavior on latest pre-release version?
@awiegel -- In the project
section of your project.yml
file, there are likely two settings:
:test_threads: 8
:compile_threads: 8
If you set these to 1
, do you still get the failures?
Similarly, if you run your tests without the gcov
plugin, does it still produce failures?
These symptoms sound like the result of your file system not keeping up with the new threading. I had seen this on Windows with earlier pre-release versions. I haven't seen them with the latest releases, but that doesn't mean there aren't some creeping issues still to be found.
I apologize that you've run into this and hope we can uncover the source!
@awiegel First of all, thank you so much for hammering prereleases so hard! The 1% failure rate certainly sounds like classic nondeterministic behavior such as with threading.
At least one other community member has been using prerelease builds for large, complex, multi-platform test suite builds successfully. They were finding threading bugs early on (reported not through Github). Since then we had thought we had found and fixed those problems. Perhaps not! The build you are referencing is months more recent than all that work.
To pile on some other thoughts / questions:
As Mark suggested, please do let us know if cranking down threading to single threads makes a difference. That said, it sounds like it is a non-trivial thing to simply re-run thousands of builds with a changed configuration.
Thank you for the quick responses!
I've tested different things now:
gcov
with test
produces the same errors.If it helps, the tests run on a docker ubuntu container which is executed on a windows pc.
Unfortunately, I cannot share any project data.
If threading 1 solves the runtime problem then you have a bad setup/teardown for the tests as it means some shared memory is overwritten by each other. So your "virtualization" is not done correctly (you didn't write what CPU and stuff, so we cant really point to a better direction) and I would look into your general memory layout for the problems.
@awiegel Well, we're learning something here. I'm trying to think of what to ask since confidentiality is a hurdle here.
Could you explain the new deterministic failures with the latest prerelease you mentioned?
@awiegel A little progress update… Some of what you reported caused me to think about changes in how test runners are generated. And, in fact, the prerelease version you first referenced is only a week or two older than those changes. Threading behavior is hard, as we all know. I think I see some gaps in thread safety those runner generation changes may have opened. It's hard to say if what I have in mind is your problem, but I do think there's an opportunity to fortify some data structure threading protections. It may simply be that not enough people have used recent prerelease versions of Ceedling as intensely and with your specific configuration to have run into the same issue you are.
@awiegel The latest prerelease has additional threading protection. I am not sure what to expect. On the one hand I can't see any code paths that would have tripped on the lack of thread protection I just added. On the other hand, circumstantial evidence and my gut says what I changed may be the source of your inconsistent builds. Only time and your own testing will tell.
@mkarlesky A little testing update from my side. With the latest prerelease (1.0.0-3d9cd04), I still get the same errors.
The random compilation failures disappear when I set :compile_threads:
to 1
(:test_threads:
don't seem to affect the failures, so it works on :auto
). So I guess there is still some error in the multithreading compilation process. Because I don't see how a bad test setup could provoke such random failures.
The random test assert failure occurs too rarely to really test it (1 out of 1000 times when running the whole test suite). Could be some bad test setup. At least it's the same test with the same assert that fails. However when I only execute this specific test, it never fails.
@awiegel Thank you for the followup. We've run some stress testing and have not yet triggered the problem. We're retooling to run better multi-threaded stress testing now. I know you are not able to share your code. However, could you share anything at all about your project and about the failing test? Are you using a lot of mocks? No mocks? Is your test suite exercising a great deal of memory operations? Do you have large test files with many, many test cases? Any complicated macros or conditional compilation scenarios? What is your build rigging (e.g. Jenkins, CircleCI, Github Action, etc.)? Are you capturing logs directly from Ceedling or capturing your Ceedling $stdout output as a log using your build system? Could you share an anonymized version of the failing test case? Anything that stands out to you as unique about your project might help us.
@awiegel We will soon have a new prerelease build that we believe fixes the issues you were experiencing. When it's ready, if you are able, would you be in a place to try it with your project? These problems are quite difficult to reproduce. The most reliable means of knowing the fixes have worked is running them with known failing projects.
@awiegel If you are able, please give this latest Ceedling 1.0.0 prerelease a try. We believe it fixes the issues you reported.
@mkarlesky Thank you for working on the fixes! Unfortunately, the first two errors with undefined reference
and file not found
are still present. The random assert failure
seems to be fixed, but because it is so rare, I cannot tell for sure.
However, the tests now take around 3 times longer, which is the same behavior when setting threads to 1.
Tested with pre-release 1.0.0-af4f1ad.
@awiegel Unfortunately, we are at a loss on where to go from here. Since you're not able to share your project there's little more troubleshooting we can think of to try. We have not seen your remaining errors in any of our testing, nor has anyone else reported these.
The best course of action is, sadly, to close this issue, release 1.0.0, and await more bug reports from others who are in a place to share more details.
Thank you for submitting this issue and hanging with us. What you could share, combined with details in other reports, did help us find and fix a critical and tricky threading problem. If you think of any more details you can share, please reopen this issue.
We have three different ceedling projects that run perfectly fine without errors. However, sometimes they fail (randomly).
We executed the tests over 1000 times, and around 1% of them failed.
Some of the errors were:
EXCEPTION: ShellExecutionException ==> 'Default Gcov Linker' (gcc) terminated with exit code [1] and output >> "/usr/bin/ld: ... undefined reference to ...
-> the file with the definition is 100% referenced in the project.yml.EXCEPTION: CeedlingException ==> Found no file 'abc.c' in search paths. However, a filename having different capitalization was found: '.../.../.../.../.../abc.c'.
-> there is only one file with this name in the whole project.Occurred on the pre-release gem
ceedling-0.32.0-2f246f1
.Does anyone else also noticed their projects failing randomly?