ZuInnoTe / hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Apache License 2.0
141 stars 51 forks source link

sbt dependencies are out of date #68

Closed phelps-sg closed 3 years ago

phelps-sg commented 4 years ago

The build.sbt dependencies for the examples are out of date (e.g. spark-sql is still on 2.0.1).

jornfranke commented 4 years ago

Hi,

Thanks. Those dependencies are just “provided” ie they will not be part of any application and use the one of the cluster. We tested that it works with 2.4 etc.

So updating it in the build file will have no effect.

Best regards

Am 11.12.2019 um 14:45 schrieb Steve Phelps notifications@github.com:

 The build.sbt dependencies for the examples are out of date (e.g. spark-sql is still on 2.0.1).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 4 years ago

This is fine for deployment but not for development. If I import the project into e.g. the IntelliJ IDE, then the IDE will download and link the out-of-date libraries. The dependencies still need to be kept up to date.

jornfranke commented 4 years ago

Well I will look into it, but normally I expect that for a spark application you define in the spark application itself the right version of the spark dependencies and do not depend on the one defined in the third party library. Then also your ide should show the version that you work with.

Reason is that others may use spark 2.3.0 instead of 2.4.5 so you can never make it right to anyone in the third party build.sbt

Am 12.12.2019 um 12:25 schrieb Steve Phelps notifications@github.com:

 This is fine for deployment but not for development. If I import the project into e.g. the IntelliJ IDE, then the IDE will download and link the out-of-date libraries. The dependencies still need to be kept up to date.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 4 years ago

Yes it is correct to use "provided". However, if the developer imports the project into the IDE via build.sbt the IDE will use the versions specified in the build.sbt. The 'provided' tag means that when you compile to a fat jar using 'sbt assembly' the jars that are tagged as provided are not included in the jar, so when we submit the job via spark it will use whatever versions are in the production environment via the classpath. However, in the development environment you will end up using the default versions specified in sbt.

phelps-sg commented 4 years ago

See https://stackoverflow.com/questions/36437814/how-to-work-efficiently-with-sbt-spark-and-provided-dependencies?rq=1

jornfranke commented 4 years ago

Well, step by step. If you build a project using spark-hadoopcryptoledger as a dependency then you put the right Spark versions within the application build.sbt and you have the right Spark versions in IntelliJ build path. IntelliJ will not see the build.sbt of spark-hadoopcryptoledger if the package is included as a dependency in the build.sbt file of the application (impossible => build.sbt is NOT published on Maven).

What could happen that additionally IntelliJ pulls in the version information from the pom file of the dependency. That is somehow an incorrect behaviour of sbt to write provided dependencies into the pom file => this I can fix for the next release. The updating of libraries as originally suggested will not help as you can never match all possible versions of Spark that a developer might use and it is naive to assume in Enterprise deployments that they will be in zero time to the latest version of Spark. That means also the developers need to work with an older version of Spark if they want to provide patches for the Spark application in production.

I propose to fix the pom file for the next release so that sbt does not write there the provided dependencies. Then intellij will not pull those Spark libraries and use only the one specified in the build.sbt file of the application.

jornfranke commented 4 years ago

Just also to avoid misunderstandings: Can you share the build.sbt file of your application?

jornfranke commented 4 years ago

sorry i was confused. You meant the dependencies of the examples and not the datasource. This is valid, I will work on this.

jornfranke commented 4 years ago

I pushed a new version 1.2.1 as well as updates of the examples. Please let me know if they work for you.

jornfranke commented 3 years ago

no further feedback from issue