WrenSecurity / wrenicf-java-framework

The Wren Security Open Identity Connectors Framework and Toolkit for Java.
http://wrensecurity.org
2 stars 8 forks source link

Tests Fail Sporadically #4

Closed Kortanul closed 2 years ago

Kortanul commented 6 years ago

Versions Affected

1.5.2.0

Steps to Reproduce

  1. Checkout the project on the sustaining/1.5.2.0 branch.
  2. Run mvn clean install -Duser.name=Kortanul -Pforgerock-release -Dignore-artifact-sigs from the root of the project.
  3. If there are no failures, repeat step 2 once more.
  4. If there are test failures, re-start the build from where it failed via mvn clean install -Duser.name=Kortanul -Pforgerock-release -Dignore-artifact-sigs -rf FAILING-STEP (where FAILING-STEP is the step that failed, which is typically :connector-frameowork-internal).

Desired

All tests pass.

Actual

Some tests fail sporadically. When re-run they pass, even without any code changes.

Some example tests that fail sporadically include:

Failure 1

Thread Id: 1    Time: 2018-03-11 13:52:07.141   Class: org.identityconnectors.framework.impl.test.TestHelpersImpl       Method: fillConfiguration(TestHelpersImpl.java:147)     Level: WARN     Message: Configuration property foo does not exist!
Tests run: 469, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 168.836 sec <<< FAILURE! - in TestSuite
testBatchUseCase3(org.identityconnectors.framework.impl.api.LocalConnectorInfoManagerTests)  Time elapsed: 0.012 sec  <<< FAILURE!
java.lang.AssertionError: expected [0] but found [1]
        at org.identityconnectors.framework.impl.api.LocalConnectorInfoManagerTests.testBatchUseCase3(LocalConnectorInfoManagerTests.java:323)

Results :

Failed tests:
  LocalConnectorInfoManagerTests.testBatchUseCase3:323 expected [0] but found [1]

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19:test (default-test) on project connector-framework-internal: There are test failures.
pavelhoral commented 2 years ago

I thought that this is connected to the timeouts being too short, but I am more and more convinced it is some sort of locking race condition because I am not able to reproduce this with cgroups (limiting cpu cores or cpu time) or when running surefire:test in a loop on a single failing module.

pavelhoral commented 2 years ago

Running tests with no limit:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Wren:ICF 1.5.3.0-SNAPSHOT:
[INFO] 
[INFO] Wren:ICF ........................................... SUCCESS [  0.193 s]
[INFO] Wren:ICF - License Bundle .......................... SUCCESS [  0.087 s]
[INFO] Wren:ICF - Framework Maven Parent .................. SUCCESS [  0.008 s]
[INFO] Wren:ICF - Framework Core .......................... SUCCESS [  1.127 s]
[INFO] Wren:ICF - Framework Test Library .................. SUCCESS [  0.835 s]
[INFO] Wren:ICF - Test Library ............................ SUCCESS [  0.004 s]
[INFO] Wren:ICF - Test Bundle v1 .......................... SUCCESS [  0.089 s]
[INFO] Wren:ICF - Test Bundle v2 .......................... SUCCESS [  0.008 s]
[INFO] Wren:ICF - Framework Internal ...................... SUCCESS [02:28 min]
[INFO] Wren:ICF - Framework OSGi .......................... SUCCESS [  0.015 s]
[INFO] Wren:ICF - Framework Protocol Buffer Messages ...... SUCCESS [  0.009 s]
[INFO] Wren:ICF - Framework Remote Procedure Call API ..... SUCCESS [  1.572 s]
[INFO] Wren:ICF - Framework Server Core ................... SUCCESS [ 11.696 s]
[INFO] Wren:ICF - Server Jetty Integration ................ SUCCESS [ 43.306 s]
[INFO] Wren:ICF - Server Grizzly Integration .............. SUCCESS [01:30 min]
[INFO] Wren:ICF - Framework - SLF4J Logging Provider ...... SUCCESS [  0.010 s]
[INFO] Wren:ICF - Java ZIP ................................ SUCCESS [  1.449 s]
[INFO] Wren:ICF - Connector Bundle Parent ................. SUCCESS [  0.010 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:01 min
[INFO] Finished at: 2022-07-14T09:05:24+02:00
[INFO] ------------------------------------------------------------------------

Running tests limiting only to single CPU core (sudo cgset -r cpuset.cpus=1 wrenicf):

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Wren:ICF 1.5.3.0-SNAPSHOT:
[INFO] 
[INFO] Wren:ICF ........................................... SUCCESS [  0.330 s]
[INFO] Wren:ICF - License Bundle .......................... SUCCESS [  0.171 s]
[INFO] Wren:ICF - Framework Maven Parent .................. SUCCESS [  0.032 s]
[INFO] Wren:ICF - Framework Core .......................... SUCCESS [  2.881 s]
[INFO] Wren:ICF - Framework Test Library .................. SUCCESS [  1.803 s]
[INFO] Wren:ICF - Test Library ............................ SUCCESS [  0.004 s]
[INFO] Wren:ICF - Test Bundle v1 .......................... SUCCESS [  0.194 s]
[INFO] Wren:ICF - Test Bundle v2 .......................... SUCCESS [  0.017 s]
[INFO] Wren:ICF - Framework Internal ...................... SUCCESS [02:43 min]
[INFO] Wren:ICF - Framework OSGi .......................... SUCCESS [  0.062 s]
[INFO] Wren:ICF - Framework Protocol Buffer Messages ...... SUCCESS [  0.029 s]
[INFO] Wren:ICF - Framework Remote Procedure Call API ..... SUCCESS [  1.126 s]
[INFO] Wren:ICF - Framework Server Core ................... SUCCESS [ 13.694 s]
[INFO] Wren:ICF - Server Jetty Integration ................ SUCCESS [01:28 min]
[INFO] Wren:ICF - Server Grizzly Integration .............. SUCCESS [01:32 min]
[INFO] Wren:ICF - Framework - SLF4J Logging Provider ...... SUCCESS [  0.024 s]
[INFO] Wren:ICF - Java ZIP ................................ SUCCESS [  0.030 s]
[INFO] Wren:ICF - Connector Bundle Parent ................. SUCCESS [  0.008 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  06:06 min
[INFO] Finished at: 2022-07-14T09:15:50+02:00
[INFO] ------------------------------------------------------------------------

Running tests only on single core with 20 % CPU allocation (sudo cgset -r cpu.max="200000 1000000" wrenicf):

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Wren:ICF 1.5.3.0-SNAPSHOT:
[INFO] 
[INFO] Wren:ICF ........................................... SUCCESS [  1.952 s]
[INFO] Wren:ICF - License Bundle .......................... SUCCESS [  0.186 s]
[INFO] Wren:ICF - Framework Maven Parent .................. SUCCESS [  0.831 s]
[INFO] Wren:ICF - Framework Core .......................... SUCCESS [ 14.135 s]
[INFO] Wren:ICF - Framework Test Library .................. SUCCESS [  8.906 s]
[INFO] Wren:ICF - Test Library ............................ SUCCESS [  0.003 s]
[INFO] Wren:ICF - Test Bundle v1 .......................... SUCCESS [  0.994 s]
[INFO] Wren:ICF - Test Bundle v2 .......................... SUCCESS [  0.025 s]
[INFO] Wren:ICF - Framework Internal ...................... SUCCESS [04:18 min]
[INFO] Wren:ICF - Framework OSGi .......................... SUCCESS [  0.549 s]
[INFO] Wren:ICF - Framework Protocol Buffer Messages ...... SUCCESS [  0.048 s]
[INFO] Wren:ICF - Framework Remote Procedure Call API ..... SUCCESS [  6.044 s]
[INFO] Wren:ICF - Framework Server Core ................... SUCCESS [ 27.017 s]
[INFO] Wren:ICF - Server Jetty Integration ................ SUCCESS [01:26 min]
[INFO] Wren:ICF - Server Grizzly Integration .............. SUCCESS [02:18 min]
[INFO] Wren:ICF - Framework - SLF4J Logging Provider ...... SUCCESS [  0.028 s]
[INFO] Wren:ICF - Java ZIP ................................ SUCCESS [  0.022 s]
[INFO] Wren:ICF - Connector Bundle Parent ................. SUCCESS [  0.806 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  09:15 min
[INFO] Finished at: 2022-07-14T09:31:54+02:00
[INFO] ------------------------------------------------------------------------
pavelhoral commented 2 years ago

Getting pretty stable failures with mvn surefire:test -pl framework-rpc (they are a bit different than the one from connector-server-grizzly module, but the underlying issue might be the same):

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.394 s <<< FAILURE! - in TestSuite
[ERROR] org.forgerock.openicf.common.rpc.RequestDistributorTest.testSimpleRequest  Time elapsed: 0.006 s  <<< FAILURE!
java.lang.AssertionError: expected [true] but found [false]
        at org.forgerock.openicf.common.rpc.RequestDistributorTest.testSimpleRequest(RequestDistributorTest.java:148)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   RequestDistributorTest.testSimpleRequest:148 expected [true] but found [false]
[INFO] 
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------

The RPC module is pretty complex and I am not sure why all the complexity is necessary. There are only 11 source files in that module but I feel like my head will explode while trying to understand all intentions and relations behind those components :/.

pavelhoral commented 2 years ago

I really don't understand why the RPC part of ICF is so so sooo complex :(. The issue from previous comment is really a race condition between RemoteRequestGroup cleanup performed in executor service thread and test assertion being performed in main thread:

All assertions based on getRemoteRequest property can not work with the current setup. Those requests are cleaned asynchronously after request completion.

pavelhoral commented 2 years ago

I have fixed few issues in the code and tests are now green (#26).

However I have to say that I am not very happy with the codebase. It is written in a way that makes it very hard to understand and work with. Also many places seem like race conditions are about to happen there... and the tests are very messy as well.

So I have one closing thought: Shouldn't we be better of just rewriting the 1.5 stuff?