ZuInnoTe / hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Apache License 2.0
141 stars 51 forks source link

Flume Source to live stream Blockchain data into HDFS #11

Open jornfranke opened 7 years ago

jornfranke commented 7 years ago

Live streaming of Bitcoin blockchain data for immediate analysis to HDFS, but also other applications (e.g. via Kafka).

It could be done as a flume source or a Kafka producer. This Flume source should 1) Provide Bitcoin Blocks to any Flume Channel 2) Provide Bitcoin Block metadata (e.g. number of confirmations, validations of checksums etc.) to any Flume Channel. Metadata should be related to one block and does not describe deltas, but only full changes. For example, the number of confirmations is always the currently known total number of confirmations and not new confirmations that are known. The reason is that otherwise the application would have to maintain this information which leads usually to inconsistent information (e.g. number of confirmations is different from the real number of confirmations etc.). However, the flume source would need to have a backend to manage state, which should be ideally configurable. Via JDBC one could connect to a variety of NoSQL databases (e.g. Hbase, ignite etc.).

Unit and integration tests must be provided. An example manual needs to be provided to integrate the Flume source into any cluster that has Flume support deployed. As a basic, it shows that Bitcoin Blocks are stored in HDFS files using the append mode and configurable file size (e.g. 128M) and meta data is stored in an updatable fashion in Hbase.

jornfranke commented 6 years ago

we will design an architecture for blockchain analytics and provide selected implementations in https://github.com/ZuInnoTe/cryptoledgerstreamer