ZuInnoTe / hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Apache License 2.0
141 stars 51 forks source link

BitcoinBlock and BitcoinTransaction are not Serializable #66

Closed phelps-sg closed 3 years ago

phelps-sg commented 4 years ago

BitcoinBlock and BitcoinTransaction are not Serializable. This prevents them from being used in e.g. Spark SQL queries.

jornfranke commented 4 years ago

Thanks for reporting. We have examples in the folder on how to use it with the Spark SQL context (Scala-spark-datasource-Bitcoin Block). Do these examples work for you ?

Am 10.12.2019 um 14:43 schrieb Steve Phelps notifications@github.com:

 BitcoinBlock and BitcoinTransaction are not Serializable. This prevents them from being used in e.g. Spark SQL queries.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 4 years ago

The examples work, but the project I am working on has some code below

    val blocks =
      sc.newAPIHadoopFile(inputFile,
          classOf[BitcoinBlockFileInputFormat],
          classOf[BytesWritable],
          classOf[BitcoinBlock])

    val timedTxs =
      (for (block <- blocks.values; tx <- block.getTransactions.asScala)
        yield TimedTx(block.getTime, tx))
          .persist(StorageLevel.MEMORY_AND_DISK)

This throws the exception below unless I modify BitcoinTransaction so that it is Serializable. With the changes in PR #67 the code above works ok.

19/12/10 16:20:28 ERROR FileFormatWriter: Aborting job 6ca6d395-632b-4ce3-99ff-c114fc86852b.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4.0 in stage 0.0 (TID 4) had a not serializable result: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction
Serialization stack:
    - object not serializable (class: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction, value: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction@1bee62e8)
jornfranke commented 4 years ago

Ah i ask because having this one serializable may have impact with the other platforms and I will need to test it. I remember that we had issues some time ago with Serializable.

Am 10.12.2019 um 17:28 schrieb Steve Phelps notifications@github.com:

 The examples work, but the project I am working on has some code below

val blocks =
  sc.newAPIHadoopFile(inputFile,
      classOf[BitcoinBlockFileInputFormat],
      classOf[BytesWritable],
      classOf[BitcoinBlock])

val timedTxs =
  (for (block <- blocks.values; tx <- block.getTransactions.asScala)
    yield TimedTx(block.getTime, tx))
      .persist(StorageLevel.MEMORY_AND_DISK)

This throws the exception below unless I modify BitcoinTransaction so that it is Serializable. With the changes in PR #67 the code above works ok.

19/12/10 16:20:28 ERROR FileFormatWriter: Aborting job 6ca6d395-632b-4ce3-99ff-c114fc86852b. org.apache.spark.SparkException: Job aborted due to stage failure: Task 4.0 in stage 0.0 (TID 4) had a not serializable result: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction Serialization stack:

  • object not serializable (class: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction, value: org.zuinnote.hadoop.bitcoin.format.common.BitcoinTransaction@1bee62e8) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
phelps-sg commented 4 years ago

It would be very strange if simply implementing Serializable broke anything. I have rerun all the tests and they passed ok. Do you have a unit test that illustrates the problem?

jornfranke commented 4 years ago

Let me check. I will look into this issue this week and publish a new version, if necessary.

Am 10.12.2019 um 17:45 schrieb Steve Phelps notifications@github.com:

 It would be very strange if simply implementing Serializable broke anything. I have rerun all the tests and they passed ok. Do you have a unit test that illustrates the problem?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jornfranke commented 4 years ago

I will further tests. I was a little bit delayed - sorry.

However, find from here also some alternative way: https://github.com/ZuInnoTe/hadoopoffice/blob/master/examples/scala-spark-exceloutput/src/main/scala/org/zuinnote/spark/office/example/excel/SparkScalaExcelOut.scala

There it is demonstrated on how you can make a Writable serializable in Spark.

jornfranke commented 4 years ago

Similarly, it seems Spark introduced also a class for this: https://spark.apache.org/docs/latest/api/java/org/apache/spark/SerializableWritable.html

jornfranke commented 4 years ago

I tested now through all code and i think we can make it serializable for convenience reasons (however, I do recommend to use special serializers, e.g. kyroserializer - or if you want to use SparkSQL then normally the Spark data source is the better solution: https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds)

jornfranke commented 4 years ago

I included your proposed change and I enhanced it to several other classes. This is part of 1.2.1. Please let me know if this works for you. Thank you (also for reporting and contribution).

jornfranke commented 3 years ago

no further comment