getTransactionHash and getTransactionHashSegwit should not throw IOException

ZuInnoTe / hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Apache License 2.0

141 stars 51 forks source link

getTransactionHash and getTransactionHashSegwit should not throw IOException #81

Closed phelps-sg closed 3 years ago

phelps-sg commented 3 years ago

The methods BitcoinUtil.getTransactionHash and BitcoinUtil.getTransactionHashSegwit() declare a throws clause with IOException but this is never thrown in practice. The underlying code that throws IOException is ByteArrayOutputStream.write() but here IOException is declared in the throws clause just for compatibility with OutputStream, and will never be thrown in practice. The code would be simpler if the hash methods would written as:

try {
  ByteArrayOutputStream transactionBAOS = new ByteArrayOutputStream();
  byte[] version = BitcoinUtil.reverseByteArray(BitcoinUtil.convertIntToByteArray(getVersion()));
  transactionBAOS.write(version);
  // etc.
} catch (IOException e) {
  throw new RuntimeException(e);   // Never happens
}

This would enable removal of redundant exception handling code in users of getTransactionHash and getTransactionHashSegwit and would simplify the API.

jornfranke commented 3 years ago

I don’t agree with that. The exception is declared by ByteArrayOutputStream - they can decide to change it at anytime. Swallowing exceptions and throwing a runtime exception is not a better practices. HadoopCryptoLedger is not responsible to fix issues in other interfaces.

Am 17.10.2020 um 16:56 schrieb Steve Phelps notifications@github.com:

The methods BitcoinUtil.getTransactionHash and BitcoinUtil.getTransactionHashSegwit() declare a throws clause with IOException but this is never thrown in practice. The underlying code that throws IOException is ByteArrayOutputStream.write() but here IOException is declared in the throws clause just for compatibility with OutputStream, and will never be thrown in practice. The code would be simpler if the hash methods would written as:

try { ByteArrayOutputStream transactionBAOS = new ByteArrayOutputStream(); byte[] version = BitcoinUtil.reverseByteArray(BitcoinUtil.convertIntToByteArray(getVersion())); // etc. } catch (IOException e) { throw new RuntimeException(e); // Never happens } This would enable removal of redundant exception handling code in users of getTransactionHash and getTransactionHashSegwit and would simplify the API.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 3 years ago

ByteArrayOutputStream should never be redesigned to throw IOException because it should never do any actual IO in virtue of its documented purpose.
Although not totally unanimous, it is a common design pattern to deal with exceptions that are never thrown using RuntimeException or assertions (https://softwareengineering.stackexchange.com/questions/122233/how-to-deal-with-checked-exceptions-that-cannot-ever-be-thrown)
When faced with design alternatives, the KISS principle should be the ultimate decider. Catching unthrown exceptions reduces the size of the code base, decreasing bloat and increasing coverage.

phelps-sg commented 3 years ago

Also, if for some bizarre reason there was a legitimate IOException thrown within getTransactionHash, the only sensible way for the caller of getTransactionHash to deal with the exception would be to log an error and halt. There is nothing sensible the caller can do to mitigate or handle the exception. So every caller of getTransactionHash has to have exception handling code like:

try {
  byte[] hash = BitcoinUtil.getTransactionHash(tr);
} catch (IOException e) {
 logger.error("Couldn't get transaction hash: " + e.getMessage());
 assert false;   // This shouldn't happen and there is nothing sensible we can do so crash with error
}

This exception handling code will end up more or less duplicated throughout all apps. Therefore by DRY we should eliminate the duplication by pulling the exception-handling logic back into getTransactionHash().

In the very worse-case the proposed change would cause unit-tests to fail with a hypothetical new release of Java that broke the contract that ByteArrayOutputStream does not do any IO. In that very unlikely situation one could easily roll-back the change.

jornfranke commented 3 years ago

I think there is little benefit in this change.

Am 17.10.2020 um 17:58 schrieb Steve Phelps notifications@github.com:

Also, if for some bizarre reason there was a legitimate IOException thrown within getTransactionHash, the only sensible way for the caller of getTransactionHash to deal with the exception would be to log an error and halt. There is nothing sensible the caller can do to mitigate or handle the exception. So every caller of getTransactionHash has to have exception handling code like:

try { byte[] hash = BitcoinUtil.getTransactionHash(tr); } catch (IOException e) { logger.error("Couldn't get transaction hash: " + e.getMessage()); assert false; // This shouldn't happen and there is nothing sensible we can do so crash with error } This exception handling code will end up more or less duplicated throughout all apps. Therefore by DRY we should eliminate the duplication by pulling the exception-handling logic back into getTransactionHash().

In the very worse-case the proposed change would cause unit-tests to fail with a hypothetical new release of Java that broke the contract that ByteArrayOutputStream does not do any IO. In that very unlikely situation one could easily roll-back the change.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 3 years ago

It eliminates lots of pointless exception handling code in my application. It is much simpler to just call getTransactionHash() without having a pointless try catch wrapper each and every time I want to get the hash.

Sent from ProtonMail mobile

-------- Original Message -------- On 17 Oct 2020, 17:28, Jörn Franke wrote:

I think there is little benefit in this change.

Am 17.10.2020 um 17:58 schrieb Steve Phelps notifications@github.com:

Also, if for some bizarre reason there was a legitimate IOException thrown within getTransactionHash, the only sensible way for the caller of getTransactionHash to deal with the exception would be to log an error and halt. There is nothing sensible the caller can do to mitigate or handle the exception. So every caller of getTransactionHash has to have exception handling code like:

try { byte[] hash = BitcoinUtil.getTransactionHash(tr); } catch (IOException e) { logger.error("Couldn't get transaction hash: " + e.getMessage()); assert false; // This shouldn't happen and there is nothing sensible we can do so crash with error } This exception handling code will end up more or less duplicated throughout all apps. Therefore by DRY we should eliminate the duplication by pulling the exception-handling logic back into getTransactionHash().

In the very worse-case the proposed change would cause unit-tests to fail with a hypothetical new release of Java that broke the contract that ByteArrayOutputStream does not do any IO. In that very unlikely situation one could easily roll-back the change.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jornfranke commented 3 years ago

Well I don’t know your application and cannot tell you if it is needed or not.

This library is supposed to be integrated with a Hadoop, Flink, Hive or Spark application, ie a big data application.

If your application is none of those then maybe a different library can make more sense for you.

Am 17.10.2020 um 18:35 schrieb Steve Phelps notifications@github.com:

It eliminates lots of pointless exception handling code in my application. It is much simpler to just call getTransactionHash() without having a pointless try catch wrapper each and every time I want to get the hash.

Sent from ProtonMail mobile

-------- Original Message -------- On 17 Oct 2020, 17:28, Jörn Franke wrote:

I think there is little benefit in this change.

Am 17.10.2020 um 17:58 schrieb Steve Phelps notifications@github.com:

Also, if for some bizarre reason there was a legitimate IOException thrown within getTransactionHash, the only sensible way for the caller of getTransactionHash to deal with the exception would be to log an error and halt. There is nothing sensible the caller can do to mitigate or handle the exception. So every caller of getTransactionHash has to have exception handling code like:

try { byte[] hash = BitcoinUtil.getTransactionHash(tr); } catch (IOException e) { logger.error("Couldn't get transaction hash: " + e.getMessage()); assert false; // This shouldn't happen and there is nothing sensible we can do so crash with error } This exception handling code will end up more or less duplicated throughout all apps. Therefore by DRY we should eliminate the duplication by pulling the exception-handling logic back into getTransactionHash().

In the very worse-case the proposed change would cause unit-tests to fail with a hypothetical new release of Java that broke the contract that ByteArrayOutputStream does not do any IO. In that very unlikely situation one could easily roll-back the change.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

phelps-sg commented 3 years ago

My application is a Spark application which takes raw block files and computes network metrics over the entire blockchain. I have forked the library and I am trying to make constructive suggestions. I have contributed improvements in the form of pull requests. For every issue that I have opened I have worked on code to address it, which I have committed to my forked repo on github. I have also issues pull requests. If you do not want to receive issues or pull requests from me I am happy to maintain my own forked version of the project at phelps-sg/hadoopcryptoledger. I will of course acknowledge you in the README and adhere to the license. If there are any other acknowledgements you would like me to make please let me know.

phelps-sg commented 3 years ago

I've created a detached fork of the project at https://github.com/phelps-sg/hadoopcryptoledger-plus which contains fixes for some of the recent issues I've opened. I will maintain this as a completely separate repo. If there any attribution or acknowledgement issues please feel free to open an issue there and I will endeavour to respond quickly.

jornfranke commented 3 years ago

Keep in mind that the library is Apache 2.0 licensed. You may want to name it differently so that people are not confused. I closed the issues and pull requests here then. Good luck with your further project.