corretto / amazon-corretto-crypto-provider

The Amazon Corretto Crypto Provider is a collection of high-performance cryptographic implementations exposed via standard JCA/JCE interfaces.
Apache License 2.0
238 stars 56 forks source link

Enforce write lock on MiscSingleThreadedTests #220

Closed WillChilds-Klein closed 2 years ago

WillChilds-Klein commented 2 years ago

Issue #, if available: n/a

Description of changes:

We've been seeing some intermittent CI failures (both internally and externally). These failures manifest in a few ways: assertions that keys translated by different providers aren't the same, buffer lengths differing from expected size, etc. These failures are consistent with unexpected providers performing intended operations. One way this can happen is a race in our tests. Some of our tests install ACCP as a JVM-global provider, while the rest of our tests assume that ACCP is not installed. Because we run our JUnit tests in parallel within the same JVM process, if one of the installing tests wins the race, subsequent losing tests will have their uninstalled-ACCP assumptions violated.

Taking a look at which tests install ACCP:

$ ag -l '\.install\(\)' tst/
tst/com/amazon/corretto/crypto/provider/test/RecursiveInitializationTest.java
tst/com/amazon/corretto/crypto/provider/test/integration/TestCertificateGenerator.java
tst/com/amazon/corretto/crypto/provider/test/MiscSingleThreadedTests.java
tst/com/amazon/corretto/crypto/provider/test/SecureRandomBench.java
tst/com/amazon/corretto/crypto/provider/test/SecureRandomGenerator.java
tst/com/amazon/corretto/crypto/provider/test/SecurityPropertyRecursiveTester.java
tst/com/amazon/corretto/crypto/provider/test/TestProviderInstallation.java
tst/com/amazon/corretto/crypto/provider/test/integration/TestHTTPSServer.java

All installing tests other than TestProviderInstallation.java and MiscSingleThreadedTests.java are run as standalone tests in a separate process from our tests run by the JUnit test runner. TestProviderInstallation.java already had a write lock installed, but MiscSingleThreadedTests.java hadn't. So, we add a write lock to ensure that it does not run concurrently (and thus potentially interfere) with any other tests.

Finally, we add @AfterClass methods to installing JUnit tests to ensure that ACCP is not installed if these tests are run before any others.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

WillChilds-Klein commented 2 years ago

For the "correctness" tests, one dimension failed in the build batch:

Screen Shot 2022-06-15 at 11 28 17 PM

Failures (1):
  JUnit Jupiter:LibCryptoRngTest:largeRequest()
    MethodSource [className = 'com.amazon.corretto.crypto.provider.test.LibCryptoRngTest', methodName = 'largeRequest', methodParameterTypes = '']
    => org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
       org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
       org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
       org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35)
       org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179)
       com.amazon.corretto.crypto.provider.test.LibCryptoRngTest.largeRequest(LibCryptoRngTest.java:160)
       java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
       java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       java.base/java.lang.reflect.Method.invoke(Method.java:568)
       org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
       [...]

Test run finished after 198927 ms
[       139 containers found      ]
[         2 containers skipped    ]
[       137 containers started    ]
[         0 containers aborted    ]
[       137 containers successful ]
[         0 containers failed     ]
[     66641 tests found           ]
[         1 tests skipped         ]
[     66640 tests started         ]
[        75 tests aborted         ]
[     66564 tests successful      ]
[         1 tests failed          ]

The failed test case does not appear to install JVM-global providers, instead specifying a provider in individual static method calls. All good there. However, it does rely on a test class instance variable being (sequentially) set up/torn down between cases. I haven't looked into JUnit internals to see whether it instantiates the test class once per @Test case. If the test runner does instantiate per-@Test case, we're good (the non-static @BeforeEach/@AfterEach won't interfere across instances). If the test runner reuses the same test class instance while parallelizing individual @Test cases that rely on a thread-unsafe instance variable, there's a race condition that could explain this failure. I'll defer that rabbit hole pursuant to below.

Additionally, the failing test case contains a comment:

        // Ensure that the resulting bytes haven't been left at zero
        // Probablistically, this will pass.

Non-determinism is fun. I'm going to kick the CI to see if this is failure recurs. In either case (i.e. whether LibCryptoRngTest has a race or doesn't due to test runner internals), the changes in this PR are a net-benefit and unrelated. But if this test continues to fail intermittently, it's likely we should look at how JUnit instantiates LibCryptoRngTest and executes its @Test cases.