cleanzr / dblink

Distributed Bayesian Entity Resolution in Apache Spark
Other
57 stars 9 forks source link

Java / Hadoop versions #6

Closed clblalock closed 5 years ago

clblalock commented 5 years ago

We are working to to have our tech support create a EMR cluster for us to test your public data, and we want to ensure that our versions are correct to replicate your findings.

1) Your guide recommends the use of openJDK. The current version of openJDK is Java SE 13. Daniel had an error while running d-blink with Java 13, which went away when he switched to version 8. Have you tested d-blink with Java 13? Can you indicate the version of openJDK / Java that used used for testing? Should I assume version 8?

2) Which version of JVM should we be using?

3) Which version of Hadoop did you use?

resteorts commented 5 years ago
  1. As referred to in our guide, we only support Java 8. "The following two steps require that Java 8+ is installed on your system. " Thus, we do not support Java 13.
  2. Again, you should be using Java 8.
  3. We used hadoop 2.7 (This can all be seen in the guide here: https://github.com/cleanzr/dblink/blob/master/docs/guide.md) "wget https://archive.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz" Just to make things that are very clear, we used spark 2.3.1
clblalock commented 5 years ago

Can you explain what the + in "8+" is intended to mean? I.e., why Java 8+ rather than Java 8?

resteorts commented 5 years ago

8+ means all version of java 8. (If you do a quick search you will see this is true). If you think about everytime Java 8 is updated, you get a new version number:) We support all versions of java 8, which is commonly referred to as 8+

ngmarchant commented 5 years ago

8+ means all version of java 8. (If you do a quick search you will see this is true). If you think about everytime Java 8 is updated, you get a new version number:) We support all versions of java 8, which is commonly referred to as 8+

Yes @resteorts is correct.

The limitation to Java 8+ is because of our dependency on Spark. I believe newer versions of Java will be supported in the upcoming Spark 3.0 release.