Decode raw transaction into transaction object

sidazhang commented 10 years ago

This is my script to decode a raw transaction from buffer.

This as it stands is not complete, I plan to add more tests.

@MidnightLightning, Do you feel like that this decode function should be in this module? I think it would make sense for the transaction module to not only be able to serialize but also be able to decode

My goal:

Be able to decode transaction
Be able to verify signatures
Be able to verify transaction to be valid (https://en.bitcoin.it/wiki/Protocol_rules#.22tx.22_messages) - this obviously involves storage. so I plan to have anything that involve the blockchain to be arguments. For example scriptPubkey of a referenced output should be supplied to this module in order to validate the transaction.

The final DSL should be something like this

var tx = Transaction.decode(rawBuf)
// referencedOutputs - should contain the outputs and also whether they have been spent
tx.validate(referencedOutputs)
// => boolean

MidnightLightning commented 10 years ago

Validating signatures will require the ECDSA library, and validating that no inputs have been already spent will require a Blockchain library. In your code example you state that referencedOutputs has "the outputs and also whether they have been spent"; are you presuming those to come from a Blockchain library? So some sort of method like: Blockchain.getOutputs(hashArray) that returns metadata about those txOuts?

sidazhang commented 10 years ago

I am presuming that the user will get those referencedOutputs from somehwere either from Blockchain.getOutputs or maybe another database that they are using. So the btc-transactions lib shouldn't refer to Blockchain.getOutputs and let the user to provide those referencedOutputs.

A referenced output could take the following format

{
   'scriptPubKey': '76xxxxxx',
   'value': 100,
   'spent': true/false
}

So a user could use Blockchain.getOutputs or get their own data somewhere, as long as it matches this format.

Thoughts?

MidnightLightning commented 10 years ago

For the specific validation step of "are the inputs unspent", I'd pose reversing the roles of which object has the validation method, and which gets passed in. Since that validation has to be in the context of a blockchain, I'd structure it as Blockchain.validate(tx). Internally the Blockchain object would verify the unspent status of the inputs used, but that way you don't have to dictate the format of that response, just the structure of the Transaction object being passed in, which the Blockchain object will already have to know to operate.

And for that validation, even if it fails once (one of the inputs is already spent), there's a chance that the block that the input was spent in becomes orphaned, and the transaction becomes valid at a future time (very rare, but possible). If you want to account for that possibility, you could hold a structurally-valid (passes all other validation steps) in memory for a while and periodically check and see if it validates with the Blockchain yet (meaning, failing that one validation step doesn't mean it has to be a fatal validation error).

sidazhang commented 10 years ago

I think that's also valid. In that case I think we can have referencedOutputs to be an array of TransactionOuts

Although I would still like Transaction to have the ability to validate itself given valid and unspent referenced outputs

The reason is that users might not want to need to run a node to validate a transaction. (It is a pretty heavy process and all the data can be found on with a standard api like blockchain.info api). A user for example, might only want to run a lightweight checker to see if the transaction is valid.

Of the steps required to validate a transaction: https://en.bitcoin.it/wiki/Protocol_rules

We need blockchain data for:

11. For each input, look in the main branch and the transaction pool to find the referenced output transaction. If the output transaction is missing for any input, this will be an orphan transaction. Add to the orphan transactions, if a matching transaction is not in there already.
12. For each input, if the referenced output transaction is coinbase (i.e. only 1 input, with hash=0, n=-1), it must have at least COINBASE_MATURITY (100) confirmations; else reject this transaction
14. For each input, if the referenced output has already been spent by a transaction in the main branch, reject this transaction[6]

So basically, it is one look up to check the prev_tx and also check the confirmation count.

What do you think? Or maybe the validation logic should be in the blockchain module? Maybe there is a way that we can pass the data in rather than forcing the user to run a node?

MidnightLightning commented 10 years ago

I agree the validation logic (for those three steps) should be in the Blockchain module. For what I have been saying, "Blockchain module" !== "node"; you could have a blockchain module that's disconnected from the network and just looking at a cached database, or a blockchain module that doesn't have its own database and instead queries an outside API for all inquiries.

sidazhang commented 10 years ago

Alright, I agree with you. I think that works well.

In such case, I do think it Blockchain module would be a good place for this. We could have the blockchain module do all the validations too. So long we could do it in such a way that let's user flexibly choose which api / or data source they would like

MidnightLightning commented 10 years ago

Right; from the transaction's perspective, it just needs to know there's a Blockchain.validateTx(tx) method and it doesn't have to know how it works. When creating a Blockchain object, you have to tell it where its data source is, from a few different options. That's my plan for the btc-blockchain library anyway!

sidazhang commented 10 years ago

Also included a small test.

I am yet to handle the issue of trying to read beyond buffer length, although I don;t really want to use an eventemitter for this

MidnightLightning commented 10 years ago

I've updated my parsing functions in the parsing-handling branch of btc-p2p if you want to take a peek there. While an event emitter would allow giving a failure response sooner, it requires a value to be returned as a callback rather than a return value, even through it's an asynchronous action. I've updated my method to an internal state property to determine failure, which means it doesn't return early on failure, but at least gracefully fails out.

sidazhang commented 10 years ago

@MidnightLightning I took a brief look at the struct object. It seems like if it fails to read the buffer it will return an internal state of "failed"

I like it. How do you feel about breaking this out into a separate module for all buffer reading?

MidnightLightning commented 10 years ago

The Struct and Message objects might be useful as a separate library for encoding/decoding binary messages. bitcoinjs-server uses the binary library, but an older version that allows both assembling and disassembling binary messages. Do you know of any other libraries like that which already exist, or should we make one custom for cryptocoin applications?

sidazhang commented 10 years ago

does the binary library not assemble message now? if it is no longer suitable. let's have our own package the reason is that bitcoin binary parsing is fairly limited we don't need a lot of functionalities.

Can I suggest to start off with a btc-binary package which will contain struct in it and then we can move other stuff such as message into it later?

I will point transaction deserialization against that package immediately.

I can do it too if you prefer?

MidnightLightning commented 10 years ago

I'd agree to a crypto-binary package (it's not specific to Bitcoin, but works for all cryptocurrencies, so doesn't need the btc label), and the two objects should probably be renamed to ParseMessage and BuildMessage to make it a bit clearer what they are doing. I can start that library, and just pull in the existing code as a starting point.

MidnightLightning commented 10 years ago

New binary message library has been created as crypto-binary

sidazhang commented 10 years ago

thanks @MidnightLightning I will make changes accordingly

sidazhang commented 10 years ago

@MidnightLightning, I just finished the refactor.

I am still converting the buffer back into a hex string because that is currently how the transaction package expects. Once we start the transition to buffer I will make a change here as well.

https://github.com/cryptocoinjs/btc-transaction/commit/13ab3d3b94c8456f1b04484c430bd4a4c7e518b3#diff-e652b5ccb2708f628a9de325cf105fb7R477

I have also added a test case for incorrect buffer

sidazhang commented 10 years ago

@MidnightLightning fixed

MidnightLightning commented 10 years ago

Okay, tests all run now; ACK from me!

sidazhang commented 10 years ago

@MidnightLightning thanks! :+1:

cryptocoinjs / btc-transaction

Decode raw transaction into transaction object #3