dstl / baleen

Entity Extraction Text Processor
Apache License 2.0
147 stars 40 forks source link

Baleen Graph doesn't build on Java 9 #79

Closed jbaker-nca closed 5 years ago

jbaker-nca commented 6 years ago

I did a clean pull of the GitHub repository, but when trying to build the project it failed on the baleen-graph project.

[ERROR] Failures: 
[ERROR]   EntityGraphFileTest.testGraphson:99->assertPathsEqual:64 expected:<...e":1},"value":""}],"[docId":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]}}],"isNormalised":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":[false]]}}],"longestValue":...> but was:<...e":1},"value":""}],"[isNormalised":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":[false]}}],"docId":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]]}}],"longestValue":...>
[ERROR]   EntityGraphFileTest.testGyro:117
[INFO] 
[ERROR] Tests run: 41, Failures: 2, Errors: 0, Skipped: 0

I'm building it on Ubuntu 16.04 with OpenJDK version 1.8.0_171.

UPDATE: Maven was actually using Java 9 (and not Java 8), and that was the cause of the problem. I've updated the issue title to reflect this.

$ mvn -version
Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 9.0.4, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-9-oracle
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-29-generic", arch: "amd64", family: "unix"
stuarthendren commented 6 years ago

That looks like it builds fine but there's a test failure. Those tests compare an existing file against a newly generated file and in particular it looks like the gyro.kryo version is failing and that the id is different.

It has certainly been tested on linux and OpenJDK 8. So, just to be sure, check that you haven't inadvertently overwritten the reference file.

The next most likely issue is that we have assumed the ordering is deterministic in the kryo file but it is not, and that something about the combination of OpenJDK and platform exposes this. If this is the case then the tests will have to change to be order independent.

Another possibility is that the id is generated differently in the platform/JDK combination.

Can you check the full log or breakpoint to see if you can determine if it is one of these cases?

jbaker-nca commented 6 years ago

Yes, sorry - I meant the tests failed so a standard build failed (as opposed to a compile error). The reference file hasn't been changed - I've done a git reset --hard to confirm.

There are two tests failing in EntityGraphFileTest - testGraphson and testGryo. For testGraphson, it looks to be an ordering issue (there's a section in the middle that is in a different order):

08:56:23.405 [main] DEBUG org.springframework.validation.DataBinder - DataBinder requires binding of required fields [outputDirectory,format,defaultValueStrategyType,aggregate,multiValueProperties,contentHashAsId,outputEvents,filterFeatures,valueCoercerType]
08:56:23.406 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function initialize
08:56:23.493 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function initialize
08:56:23.503 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function process
08:56:23.519 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - DocumentGraph metadata skiped
08:56:23.523 [main] INFO uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Writing graph to output file /tmp/EntityGraphFileTest11891874241672594837/test.json
08:56:23.533 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function process
08:56:23.533 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function destroy
08:56:23.533 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function destroy

org.junit.ComparisonFailure: 
Expected :{"id":"a2526c484702c9d0219155b3caa09d2c22017d0b7cf035e257d1cb0b454c97d8","label":"Entity","inE":{"Relation":[{"id":"104d630293cf15b144a3628a7ecb6c66466a5e96573695a85246f75c5eeda0fa","outV":"3794f579617a933e63dd65ef9697e4eb3d8083cef332742444fa10b5eca739ef","properties":{"relationshipType":"lives","dependencyDistance":{"@type":"g:Int32","@value":0},"wordDistance":{"@type":"g:Int32","@value":0},"docId":"8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165","mentions":{"@type":"g:List","@value":[{"@type":"g:Map","@value":["confidence",{"@type":"g:Double","@value":0.0},"end",{"@type":"g:Int32","@value":71},"id","104d630293cf15b144a3628a7ecb6c66466a5e96573695a85246f75c5eeda0fa","begin",{"@type":"g:Int32","@value":63}]}]},"sentenceDistance":{"@type":"g:Int32","@value":0},"source":"cb7ba8e02c88dcdc832f181c1336ce54334f9bb125bd90371a6d59d098844f23","type":"Relation","value":"lives at","target":"a4c815e6f4a55afd579e93eae16090ab7f268acf7edd748600247280e45bc1e2"}}]},"properties":{"geoJson":[{"id":{"@type":"g:Int64","@value":2},"value":{"@type":"g:List","@value":["{ \"type\": \"Feature\", \"geometry\": {\"type\":\"Point\",\"coordinates\": [125.6, 10.1]},\"properties\": {\"name\": \"Dinagat Islands\"}}"]}}],"linking":[{"id":{"@type":"g:Int64","@value":1},"value":""}],"docId":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]}}],"isNormalised":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":[false]}}],"longestValue":[{"id":{"@type":"g:Int64","@value":7},"value":"Dinagat Islands"}],"mentions":[{"id":{"@type":"g:Int64","@value":0},"value":{"@type":"g:List","@value":[{"@type":"g:Map","@value":["confidence",{"@type":"g:Double","@value":0.9},"end",{"@type":"g:Int32","@value":87},"id","a4c815e6f4a55afd579e93eae16090ab7f268acf7edd748600247280e45bc1e2","begin",{"@type":"g:Int32","@value":72}]}]}}],"mostCommonValue":[{"id":{"@type":"g:Int64","@value":8},"value":"Dinagat Islands"}],"type":[{"id":{"@type":"g:Int64","@value":5},"value":{"@type":"g:List","@value":["Location"]}}],"value":[{"id":{"@type":"g:Int64","@value":6},"value":{"@type":"g:List","@value":["Dinagat Islands"]}}]}}
Actual   :{"id":"a2526c484702c9d0219155b3caa09d2c22017d0b7cf035e257d1cb0b454c97d8","label":"Entity","inE":{"Relation":[{"id":"104d630293cf15b144a3628a7ecb6c66466a5e96573695a85246f75c5eeda0fa","outV":"3794f579617a933e63dd65ef9697e4eb3d8083cef332742444fa10b5eca739ef","properties":{"relationshipType":"lives","dependencyDistance":{"@type":"g:Int32","@value":0},"wordDistance":{"@type":"g:Int32","@value":0},"docId":"8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165","mentions":{"@type":"g:List","@value":[{"@type":"g:Map","@value":["confidence",{"@type":"g:Double","@value":0.0},"end",{"@type":"g:Int32","@value":71},"id","104d630293cf15b144a3628a7ecb6c66466a5e96573695a85246f75c5eeda0fa","begin",{"@type":"g:Int32","@value":63}]}]},"sentenceDistance":{"@type":"g:Int32","@value":0},"source":"cb7ba8e02c88dcdc832f181c1336ce54334f9bb125bd90371a6d59d098844f23","type":"Relation","value":"lives at","target":"a4c815e6f4a55afd579e93eae16090ab7f268acf7edd748600247280e45bc1e2"}}]},"properties":{"geoJson":[{"id":{"@type":"g:Int64","@value":2},"value":{"@type":"g:List","@value":["{ \"type\": \"Feature\", \"geometry\": {\"type\":\"Point\",\"coordinates\": [125.6, 10.1]},\"properties\": {\"name\": \"Dinagat Islands\"}}"]}}],"linking":[{"id":{"@type":"g:Int64","@value":1},"value":""}],"isNormalised":[{"id":{"@type":"g:Int64","@value":3},"value":{"@type":"g:List","@value":[false]}}],"docId":[{"id":{"@type":"g:Int64","@value":4},"value":{"@type":"g:List","@value":["8b408a0c7163fdfff06ced3e80d7d2b3acd9db900905c4783c28295b8c996165"]}}],"longestValue":[{"id":{"@type":"g:Int64","@value":7},"value":"Dinagat Islands"}],"mentions":[{"id":{"@type":"g:Int64","@value":0},"value":{"@type":"g:List","@value":[{"@type":"g:Map","@value":["confidence",{"@type":"g:Double","@value":0.9},"end",{"@type":"g:Int32","@value":87},"id","a4c815e6f4a55afd579e93eae16090ab7f268acf7edd748600247280e45bc1e2","begin",{"@type":"g:Int32","@value":72}]}]}}],"mostCommonValue":[{"id":{"@type":"g:Int64","@value":8},"value":"Dinagat Islands"}],"type":[{"id":{"@type":"g:Int64","@value":5},"value":{"@type":"g:List","@value":["Location"]}}],"value":[{"id":{"@type":"g:Int64","@value":6},"value":{"@type":"g:List","@value":["Dinagat Islands"]}}]}}
 <Click to see difference>

    at org.junit.Assert.assertEquals(Assert.java:115)
    at org.junit.Assert.assertEquals(Assert.java:144)
    at uk.gov.dstl.baleen.consumers.file.EntityGraphFileTest.assertPathsEqual(EntityGraphFileTest.java:64)
    at uk.gov.dstl.baleen.consumers.file.EntityGraphFileTest.testGraphson(EntityGraphFileTest.java:99)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

For testGryo, it looks to be a file path issue:

08:56:22.981 [main] DEBUG org.springframework.validation.DataBinder - DataBinder requires binding of required fields [outputDirectory,format,defaultValueStrategyType,aggregate,multiValueProperties,contentHashAsId,outputEvents,filterFeatures,valueCoercerType]
08:56:22.982 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function initialize
08:56:23.148 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function initialize
08:56:23.159 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function process
08:56:23.179 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - DocumentGraph metadata skiped
08:56:23.220 [main] INFO uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Writing graph to output file /tmp/EntityGraphFileTest7773039366148030115/test.kyro
08:56:23.231 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function process
08:56:23.231 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Starting function destroy
08:56:23.231 [main] DEBUG uk.gov.dstl.baleen.consumers.file.EntityGraph[unknown] - Finishing function destroy

java.lang.AssertionError
    at org.junit.Assert.fail(Assert.java:86)
    at org.junit.Assert.assertTrue(Assert.java:41)
    at org.junit.Assert.assertTrue(Assert.java:52)
    at uk.gov.dstl.baleen.consumers.file.EntityGraphFileTest.testGyro(EntityGraphFileTest.java:117)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
jbaker-nca commented 6 years ago

For testGryo, the two paths it's comparing and failing on (Line 117 and EntityGraphFileTest) are:

Expected path: /home/bakerj/baleen/baleen-graph/target/test-classes/uk/gov/dstl/baleen/consumers/file/entity.kyro Actual path: /tmp/EntityGraphFileTest932891212351411219/test.kyro

stuarthendren commented 6 years ago

I can't reproduce the failures here.

The kyro test paths look reasonable, obviously the temp directory being platform dependent. The fail is in the com.google.common.io.Files.equal can you see inside that if it's the earlier checks or the asByteSource test that fails?

For the ordering issue I've changed the test to sort the lines first before comparison, this is a weaker assertion so assuming it passes for you it would be interesting to understand why it happens for you but not anyone else. I've pushed it here so you can test it before I make a pull request.

jbaker-nca commented 6 years ago

Ah right, I (wrongly) assumed that com.google.common.io.Files.equal was comparing paths rather than contents. It is failing on the following line of com.google.common.io.Files.equal:

return asByteSource(file1).contentEquals(asByteSource(file2));

The ordering issue wasn't fixed by your changes. I don't think it's the order of the lines in the file, but rather the ordering of the JSON elements on a line. It looks like isNormalised and docId are being reversed.

JohnDaws commented 6 years ago

James, I assume that you and Stuart are not any further with resolving this.

I have set up a VM with the same configuration as you, and can build Baleen and run the tests of Baleen-Graph without error, so it seems like it may be an issue at your end.

john@john-VirtualBox:~/baleen/baleen-graph$ java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
john@john-VirtualBox:~/baleen/baleen-graph$ mvn -version
Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 1.8.0_171, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-29-generic", arch: "amd64", family: "unix"
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 41, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:08 min
[INFO] Finished at: 2018-08-06T11:52:59+01:00
[INFO] Final Memory: 26M/152M
[INFO] ------------------------------------------------------------------------

Other than possible slight differences between Ubuntu versions, the only other difference that I can think of is that I build a fresh clone of the repo with a newly installed maven 3.3.9 and so I will have pulled all dependencies down from scratch.

Is there any other relevant information you can provide regarding your setup to see if we can reproduce the issue? Alternatively, if you are able to provide a pull request that resolves it frmo your end that would be great!

Thanks, John

jbaker-nca commented 6 years ago

I compared my output to yours above, and noticed that whilst java was using Java 8 my mvn was using Java 9 (I have both installed). Changing it to use Java 8 and it built fine, so it looks like this is actually an issue building on Java 9. I'll update the issue title to reflect that.

stuarthendren commented 6 years ago

Ok, I'm not sure what the official line on Java 9 support for Baleen is?

JohnDaws commented 6 years ago

Unfortunately we currently only build and test with Java 8, however we would actively welcome open source contributions that improve compatibility with later versions of Java. I will leave this issue open in hope!

chrisflatley commented 5 years ago

Just a heads up.

We've started a branch so the code builds with tests passing on Java 11, see https://github.com/commitd/baleen/tree/jdk11

Still need to look into this issue, but once done, we'll raise a PR (end of the week).

It's largely just dependency updates (and a few trivial code changes for Odin). Hopefully it addresses and closes:

JohnDaws commented 5 years ago

That sounds fantastic. I don't think that we move quickly enough to keep up with the 6 monthly releases of Java, but having support for the next Long Term Support version would be amazing.

jbaker-dstl commented 5 years ago

This has been fixed as of the latest snapshot build - any further issues, please re-open this issue or create a new one.