Closed omervk closed 6 years ago
Thanks for notifying. In fact, we are not strictly bound to Bouncy Castle 1.58 and it is a provided dependency. So i do not see an issue to use Bouncy Castle 1.51. Do you refer to the example for the Spark data source? I would rather prefer to update the example...
I can update the example: https://github.com/ZuInnoTe/hadoopcryptoledger/blob/master/examples/scala-spark-datasource-ethereumblock to include only a compatible Bouncy Castle version
The underlying issue is because HCL depends on functionality that exists in 1.58 and does not exist in 1.51. I've cross referenced the issue I opened with HCL above (https://github.com/ZuInnoTe/hadoopcryptoledger/issues/37).
We've tried forcing HCL to use 1.51, but this failed because the features needed were not there.
I will investigate.
Can you please also open an issue with Apache Spark. It is not recommended to use such old versions of Bouncy Castle.
We already have, at the source of the issue. https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency
Can you please provide me the exact list of dependencies that you are using? The integration tests for the spark data sources with Spark 2.2.0 show no issue with BC 1.58. I will investigate also on the cluster. Do you have a specific distribution? Are you sure that the bouncy castle dependency comes from Spark? Can you point me to the build file in Apache Spark where it is mentioned?
The bouncy castle dependency that causes the issue is located in the jets3t project Omer linked above.
Spark 2.1.2 is dependent on jets3t:0.7.1
which is not dependent on bouncy castle.
However, Spark 2.2.0 is dependent on jets3t:0.9.3
which depends on bcprov-jdk15on:1.51
I tried downloading JARs from maven for: hadoopcryptoledger-fileformat-1.1.1
, bcprov-ext-jdk15on-1.58
, spark-hadoopcryptoledger-ds_2.11-1.1.1
and then running a test with:
/opt/spark-2/bin/spark-shell --jars 'spark-hadoopcryptoledger-ds-assembly-1.1.0.jar,hadoopcryptoledger-fileformat-1.1.1-all.jar'
which failed.
I managed to solve the issue by removing /opt/spark-2/jars/bcprov-jdk15on-1.51.jar
which is basically removing Spark's dependency and thus is not a viable solution.
As of now, I created a shadow JAR of hadoopcryptoledger-fileformat-1.1.1
including bcprov-jdk15on:1.58
but relocating it (as describe here)
After doing this I managed to get things to work. But this is clearly a hack, I hope jets3t would update their dependency soon and Spark as well so that I can remove this hack. Until then, your README should point out that issue.
yes very valid point, thanks for the detailed investigation. I will add something to the wiki and the README
Also, please note that changing the version on your build.sbt file in tests might not reproduce the error since it would have to load 2 versions of bouncy castle, the old one (1.51) would be evicted in place of the new one (1.58) and thus the tests would load the new one and pass
ok, it should not happen, because they would be made available as integration tests, but I check also the cluster later. Meanwhile: Is the update/information in https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds/blob/master/README.md#information-spark-22-and-outdated-bouncy-castle-library correct/suffcient to close the issue?
Not really, since we'll need to follow up on the issue when a new version of Spark comes out.
When attempting to load an Ethereum blockchain and enrich the results, there is a conflict between Spark 2.2+'s requirement of Bouncy Castle 1.51 and Hadoop Crypto Ledger 1.1.1's requirement of Bouncy Castle 1.58, which causes an exception.
This ought to be noted in the README file.