Open ScottG489 opened 3 years ago
Here are my thoughts on each bullet pointed idea above.
Upgrading to JDK 13 seemed to fixed the issue. I don't really want to upgrade the JDK if possible, but don't think I have any good reason not to. It seems like it would be a good idea to keep up with the latest version if possible just like any other dependency?
jqwik
It doesn't seem like it's an issue with jqwik
because if I let the failing tests run but with their body's commented out the test passes. So to me that means jqwik is doing everything the same but my test isn't running and since it works it isn't jqwik
. However, it seems like there is some interplay with the rest of the test suite because if only the failing tests are run (--tests
) they pass. So this isn't completely ruled out. I'll have to play around with the rest of the test suite and see what is the external force that causes the offending tests to fail only when run with the whole suite.
Removing TempSecretsFileUtilTest
didn't seem to have an effect but removing ConfigTaskTest
fixed the issue. If either of its tests are enabled the failure is reproduced. The failure itself seems to occur in ConfigTask.execute()
. Specifically, the offending code happens somewhere here: String originalConfig = configFieldMethods.entrySet().stream().map(configEntry ->t;
. Line 53 (in the lambda) is never reached.
We define anonymous functions in this class and the error seems to have something to do with redefining methods. The JDK code that seems to blow up is in JDK 11src/hotspot/share/runtime/sharedRuntime.cpp
. Here is the method with the relevant comment:
// Resolves a call.
methodHandle SharedRuntime::resolve_helper(JavaThread *thread,
bool is_virtual,
bool is_optimized, TRAPS) {
methodHandle callee_method;
callee_method = resolve_sub_helper(thread, is_virtual, is_optimized, THREAD);
if (JvmtiExport::can_hotswap_or_post_breakpoint()) {
int retry_count = 0;
while (!HAS_PENDING_EXCEPTION && callee_method->is_old() &&
callee_method->method_holder() != SystemDictionary::Object_klass()) {
// If has a pending exception then there is no need to re-try to
// resolve this method.
// If the method has been redefined, we need to try again.
// Hack: we have no way to update the vtables of arrays, so don't
// require that java.lang.Object has been updated.
// It is very unlikely that method is redefined more than 100 times
// in the middle of resolve. If it is looping here more than 100 times
// means then there could be a bug here.
guarantee((retry_count++ < 100),
"Could not resolve to latest version of redefined method");
// method is redefined in the middle of resolve so re-try.
callee_method = resolve_sub_helper(thread, is_virtual, is_optimized, THREAD);
}
}
return callee_method;
}
I think we should refactor ConfigTask.java
and break it up into smaller classes and test those individually. Hopefully splitting up those tests will ease whatever issue is happening here.
If that doesn't work we'll have to circle back on this. I think after that we might want to play around with running a subset of tests to see what is causing the JRE crash only when the entire test suite is run. Some ideas there would be to see how many tests we can run before the crash (I think tests always run in order so try disabling tests after the crash and then binary search to find the number of tests that cause the crash), and try playing around with the number of tries in tests such as reducing them all to tries = 1
, etc. If tries seem to affect it then it might be a good idea to make an issue with jqwik
. Not that I think there's a bug in their lib, but to get some feedback on if they think using jqwik
might be exacerbating things to cause this issue.
Code in ConfigTask
was extracted into dependency classes and tests were refactored and added accordingly. ConfigStore
now has the bit of code which seems to be causing this problem (from previous comment):
Specifically, the offending code happens somewhere here:
String originalConfig = configFieldMethods.entrySet().stream().map(configEntry ->t;
. Line 53 (in the lambda) is never reached.
The test for the particular bit of code causing the problem is still reproducing it. For now I am going to just comment out this code. We'll lack unit test coverage there, but we should be able to get it covered with integration tests, and it's already covered with acceptance tests.
As discussed in the previous comment above, it seems that the solution here may come when we upgrade Java versions.
The failure seems to consistently fail at the same point.
Unfortunately it's only reproducible on the build server, which is bad because the whole point is to have parity. In any case, a few ideas to fix this: