feedzai / feedzai-openml-java

Implementations for Feedzai's OpenML APIs to allow for usage of machine learning models in the Java programming language.
https://www.feedzai.com
Apache License 2.0
2 stars 11 forks source link

bug in tests: Java surefire forking is causing LightGBM to fail with serious C++ issues #122

Open AlbertoEAF opened 2 years ago

AlbertoEAF commented 2 years ago

As of #121 LightGBM tests can still fail after the first run with a serious C++ error from the LightGBM side:

[ERROR] munmap_chunk(): invalid pointer
[ERROR] Aborted (core dumped)

After that the following messages will show up:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Openml LightGBM lib 3.0.1-with_model_locale_fix_for_java_and_streaming SUCCESS [  1.659 s]
[INFO] openml-lightgbm-module 1.0.12-SNAPSHOT ............. SUCCESS [  0.378 s]
[INFO] openml-lightgbm 1.0.12-SNAPSHOT .................... FAILURE [  9.691 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12.312 s
[INFO] Finished at: 2022-05-27T14:48:38+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project openml-lightgbm: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider && /home/alberto.ferreira/.sdkman/candidates/java/8.0.302-open/jre/bin/java -javaagent:/home/alberto.ferreira/.m2/repository/org/jacoco/org.jacoco.agent/0.8.4/org.jacoco.agent-0.8.4-runtime.jar=destfile=/home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/jacoco.exec -jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire/surefirebooter2472734679044173114.jar /home/alberto.ferreira/code/external/feedzai-openml-java/openml-lightgbm/lightgbm-provider/target/surefire 2022-05-27T14-48-29_842-jvmRun1 surefire8810960277057352224tmp surefire_07484658466683880775tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] com.feedzai.openml.provider.lightgbm.LightGBMBinaryClassificationModelTrainerTest
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:305)
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:265)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
[ERROR]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR]     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR]     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR]     at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR]     at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
[ERROR]     at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
[ERROR]     at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
[ERROR]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR]     at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)

How to reproduce

# Go to repo root and run:
mvn clean install -DskipTests
cd openml-lightgbm  # Go to provider directory to run its tests.

Tests will always run with:

mvn clean test

However, they will always fail (at least after the first run) with:

mvn test

Conclusion

surefire settings: How to have mvn test work all the time

By simply changing maven-surefire-plugin's forking settings (see [1]) it's possible to have mvn test work all the time too.

This is the observed behaviour regarding mvn test:

Update: All fork settings can still fail after all. The failure odds change though. The only command that works all the time is still mvn clean test. This means the approaches described below don't fix the issue.

Analysis

Given that disabling forks altogether vs running the tests in a forked process (default) causes the issues, the problem seems caused by Java's test stack and not by the LightGBM C++ side (which always works with mvn clean test anyway).

Note: Upgrading to the latest versions of jacoco 0.8.6 and maven-surefire-plugin to 3.0.0-M6 didn't fix the issue either.

Possible solutions

Now, we can disable forks altogether, instead of forking the process before starting the tests once (default, which is failing); or we can set tests not to reuse forks. The latter option is slower as it forks the process for every test class.

Are there any downsides to doing these approaches? And which of those approaches is best?

References

1. maven-surefire-plugin forking docs

AlbertoEAF commented 2 years ago

Apparently the issue is somewhere in these tests related to LGBM feature contributions, from the LightGBMBinaryClassificationModelTrainerTest test class:

Disabling them lets tests run all the time with mvn test. Can you review if we're doing something wrong in the Java provider @mlobofeedzai ?

Otherwise, I'm afraid it's a bug at the LGBM level related with fast prediction handles, and having several to our booster: https://github.com/microsoft/LightGBM/issues/4980#issuecomment-1039038250, which would be much trickier to fix.