ZuInnoTe / hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Apache License 2.0
141 stars 51 forks source link

Use long to represent a 4-byte unsigned integers #70

Closed phelps-sg closed 3 years ago

phelps-sg commented 3 years ago

The fields in org.zuinnote.hadoop.bitcoin.format.common.Block use Java's primitive int. The problem here is that Java integers have a sign bit whereas the blockchain protocol uses 4-byte unsigned integers. This means that if there are field values greater than 2^31 they will be represented as negative values.

Ideally the fields should be represented internally in Java as 64-bit long, and then serialized or deserialized as 4-byte unsigned integers.

jornfranke commented 3 years ago

Are you sure this is needed? For the int fields it does not matter as they need to be anyway converted (e.g. time etc.). So there is no need to have them as long. The fields are just containers that store the bits. For example, time - it can be converted to a real datetime based on on the int. May I know the use case that you have in mind.

jornfranke commented 3 years ago

I propose to add the conversion functions to BitcoinUtil to keep the data size in Block minimal.

phelps-sg commented 3 years ago

The fields are just containers that store the bits. If this were true, then why bother to parse them into typed fields at all; you could just store the original little-endian raw bits.

For example, time - it can be converted to a real datetime based on on the int.

  • Sometimes you just care about the time-ordering and/or the time-intervals which you can compute using the standard operators - and <. Negative values would break this.
  • The standard way to convert an integer time-stamp (epoch time) to a java.util.Date is through the constructor. However, if we pass a negative epoch we obtain dates before the epoch:
scala> new Date(-10000)
res1: java.util.Date = Thu Jan 01 00:59:50 GMT 1970

Yes, we can go back to the original bits, and reparse them as a long, but then why bother parsing them as an int- we have to do the work twice.

I propose to add the conversion functions to BitcoinUtil to keep the data size in Block minimal.

See https://stackoverflow.com/questions/6909414/do-methods-in-class-instances-take-a-place-in-memory.

phelps-sg commented 3 years ago

I accept though that this changing from int to long could increase the memory and storage footprint significantly.

For the date field it is not a problem for a while yet ;-), and probably not for the bits.

The nonce could overflow though. It's not a huge deal but makes it inconsistent a standard blockexplorer.

jornfranke commented 3 years ago

the pull request does not match the issue content.

Please do not work on any issues before it has been agreed and discussed. Otherwise we cannot accept pull requests.