Error: Size greater than stream: 38133322 > 56518

donnlee commented 7 months ago

I like this library a lot, but I have hit a blocker. Tx.decode() fails with this error for some transactions that have already been confirmed. The error message is always the same: 38133322 > 56518 for the txn in my repro, below.

I notice this tends to happen with txns with many inputs, or many outputs.

Here's my repro script (node 19.9.0):

// Version 1.4.4 from npm.
import { Tx } from "@cmdcode/tapscript";
import * as lib_bitcoin_rpc from './lib_bitcoin_rpc.js'

async function main() {
  // The following txid causes a decode script crash.
  const txid = '74424af880bad9a82b0969b241fc8ee15db56ef5b852fc81649a7545a1e27036'
  // The following txid does not cause a crash.
  //const txid = '1cd87feb39f3696a0573cdfc5b981f6040954817fb1d59a082fec749693625b9'

  // Get txn as hexstring.
  const { result: rawTxn } = await lib_bitcoin_rpc.getRawTransaction(txid)
  console.log(rawTxn)

  const tx = Tx.decode(rawTxn)
  console.log('Done.')
}

main()

Output:

020000000001fd24021615950c0d6e983937ca8cf33e8b94ec90b51b67868e66e1626170f942b2b2e80500000000fdffffff374f40f1b0b08e7f7ff144a5266c7e9ecdc3b8d30a7afec3ebd1b0e927ad3f812200000000fdffffff94090c725875db7194e1f47b2f1f1f8ef9c3c4aea3ea487e1c8b66fbadbfbaa60300000000fdfffffffb21f9f7c8b3028afac209af3c7d14e57e6e7dd3b45a452b37324892c71da8100600000000fdffffff8234d40bdb83594b8f9f6bc0f3889007f0ebfaea46bd38fb47de59a85821642b0100000000fdffffff60811e1de61a40b4ad98b55562461d5f380163d8682a1c39
...
c68c914817c641f4a29ae40121023e043fd3e13df9c2a8640adc0d07842fb490f93c5f307ce7d859ed277b83e14f16ae0c00
file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1069
            throw new Error(`Size greater than stream: ${size} > ${this.size}`);
                  ^

Error: Size greater than stream: 38133322 > 56518
    at Stream.peek (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1069:19)
    at Stream.read (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1075:28)
    at readData (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:6684:18)
    at readScript (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:6688:18)
    at readInput (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:6651:20)
    at readInputs (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:6643:21)
    at Object.decodeTx [as decode] (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:6613:17)
    at main (file:///home/donn/workspace/gitlab.com/proj/inscribe/hello_repro_decode_txn_error_for_publishing_to_issue.js:18:17)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v19.9.0

I hope this can be investigated because I would hate to port all my code to another lib. Please let me know if there's more info I can provide. Thank you

cmdruid commented 7 months ago

Thank you for reporting this. There was an endianess issue with parsing the input count in a tx, which reared its ugly head with a very large set of inputs.

Released version 1.4.5 which fixes this issue. Let me know if you run into any further problems.

donnlee commented 7 months ago

You rule. All is good now with 1.4.5. Thank you for the awesome lib!

donnlee commented 7 months ago

Oh, I might have found another one. Error report coming.

donnlee commented 7 months ago

Getting similar error when i try to decode a script with:

const script = Script.decode(scriptAsHexString)

txid: 9b273c9880fcfa8da4cf5d202710b546f9c2f67b1a203d58aefd97964100ba72

Error:

file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1069
            throw new Error(`Size greater than stream: ${size} > ${this.size}`);
                  ^

Error: Size greater than stream: 4113302305 > 74
    at Stream.peek (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1069:19)
    at Stream.read (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1075:28)
    at decodeWords (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1409:35)
    at Object.decodeScript [as decode] (file:///home/donn/workspace/gitlab.com/proj/inscribe/node_modules/@cmdcode/tapscript/dist/module.mjs:1378:12)
    at main (file:///home/donn/workspace/gitlab.com/proj/inscribe/hello_repro_decode_txn_error_for_publishing_to_issue_decode_script.js:34:25)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v19.9.0

donnlee commented 7 months ago

In that tx, scriptAsHexString is:

4e21032cf515e2ae74b6d639c4e91a0b2a7047a0178e628d167989af46d1da474a6951ad21022d6b369d9a95568203b2a51eac49cc8b20ab81930e2d2565df5f0bc8e3bf59d5ac73640380ca00b268

this is the raw tx:

020000000001044a09bb8a75db2e360b8ef45e67b00e4e81864682c86d68b432f72123341b1146000000002322002042ac202bdb3601126e3325c2cabb142274da06209f47eb595a4c3e587fe8fe5bfdffffff2c1abe84242293c61441fa7cb5dd10917764b6bd00e0c5b0c97a03eee019f931180000002322002003cfb25e10b99f9583e16b826038cc0626abc09317fb161d74a54e39bea8b0d7fdffffff09c2ebf90f94e3638de23ef07d2130df43a1301c44d4aed7fd572f46857f1da2000000002322002042ac202bdb3601126e3325c2cabb142274da06209f47eb595a4c3e587fe8fe5bfdffffff2c1abe84242293c61441fa7cb5dd10917764b6bd00e0c5b0c97a03eee019f9310d0000002322002003cfb25e10b99f9583e16b826038cc0626abc09317fb161d74a54e39bea8b0d7fdffffff01bfdd94000000000017a91416d4ccd3475f6754c8e143762da83bdaaa8afb80870347304402201259511741fc8262c509475f7746ab607f3d11bbf33f22b87beec527e159b8130220370f0e3487cc72ce7387d255d40eda26d498a7079a4afe32ec749b4ae102027201473044022034aa7d0524c8fee7228a75880489b13239dbb9af4854aa30ba4e66fe2ebc648a02206f7bec4d914ddf6f03159de8253f380b07a98b09b4e0ed5f1d9f1ffca771d02b014e21032cf515e2ae74b6d639c4e91a0b2a7047a0178e628d167989af46d1da474a6951ad21022d6b369d9a95568203b2a51eac49cc8b20ab81930e2d2565df5f0bc8e3bf59d5ac73640380ca00b26803473044022042584b95f101f8e09081ec8a4ce103b6a160ee144621cc1ec44300e199bf53d702206ca84eda2b54ece36274852aa8a221b1a7e109af0c85e853576134ac5b19707001473044022052d312808d7c0f79353f8f06b4bde16356033db8a29fbc422a4270ef7b6bca1f022037776cb91a73f9f4e82b8758331d04b116b2d156e9cba8b13032ad2c28c0ea08014e2102c58d36a062a688338cce2cd768e8158d13f589a0cfe76cff772ef545133f7635ad210313f87616d06f8345a2ae67e865824229f0ea2808dd48fc5359d75154e3116829ac73640380ca00b2680347304402202f1995ef920e717b5e58c1f01e1397645e2be3d77d5d87d235147539848a230e0220562b7419f770c051dbe6e9aa8b67cbe1c67571e7ade8514bc5290990687a4cce014730440220346bc4f853c2de23b8d224440e87def9b95c99500c88c2d45715428128d9555402203fe5d447e4031c9ded575a82a25606ca57a176445e3decfb0fe082c1e446bbe8014e21032cf515e2ae74b6d639c4e91a0b2a7047a0178e628d167989af46d1da474a6951ad21022d6b369d9a95568203b2a51eac49cc8b20ab81930e2d2565df5f0bc8e3bf59d5ac73640380ca00b26803473044022001556d2f42d3d03a3f155470f5696b8533ff5ad7c937d0b8dfc0c36fba74a1520220329af99245479c3bb60d547b254f4df8b96bf1e7c0658ac8789543379eb51f870147304402202d05e07133f7fb7def6011677588e4ca0917d180b6c80036cac22111c29eef6e02203af8fa8a4b12a380c8e42e51dece0867c1aec1aa613a731ecb7b2116d8356ef3014e2102c58d36a062a688338cce2cd768e8158d13f589a0cfe76cff772ef545133f7635ad210313f87616d06f8345a2ae67e865824229f0ea2808dd48fc5359d75154e3116829ac73640380ca00b26843af0c00

donnlee commented 7 months ago

This is my repro script. I don't know how to isolate the witness script in fewer steps. So pls tell me if this is wrong.

import { Script, Tx } from "@cmdcode/tapscript";
import * as lib_bitcoin_rpc from './lib_bitcoin_rpc.js'
import * as lib_hexutils from './lib_hexutils.js'

async function main() {

  let tx

  // The following txid causes a decode script crash.
  const txid = '9b273c9880fcfa8da4cf5d202710b546f9c2f67b1a203d58aefd97964100ba72'
  // The following txid does not cause a crash.
  //const txid = '314517d30f170fe74e39aeea8f85246330184900009ba5ea6b3a8d2c080746fe'

  const { result: rawTxn } = await lib_bitcoin_rpc.getRawTransaction(txid)  // Get as hexstring.
  tx = Tx.decode(rawTxn)
  console.log('Tx decode done.');
  console.log(rawTxn);

  const inputs = tx.vin
  const input = inputs[0]
  console.log(input);
  const witWithNamedValues = Tx.util.readWitness(input.witness)
  console.log('witWithNamedValues:', witWithNamedValues);

  let scriptAsHexString
  if (witWithNamedValues.script) {
    const scriptUint8arr = witWithNamedValues.script
    scriptAsHexString = lib_hexutils.uint8arrayToHexstring(scriptUint8arr)
    console.log('scriptAsHexString:', scriptAsHexString);
  }

  const script = Script.decode(scriptAsHexString)
  console.log('Done.');
}

main()

cmdruid commented 7 months ago

Getting similar error when i try to decode a script with:

The 4e in front of the script is a size byte. If you drop that byte, the script will parse correctly.

I did update Script.decode() in v1.4.6 so that if you pass true as a second parameter, it should parse the script with the 4e size byte. You can use this boolean to turn the size byte parsing on or off.

Let me know if this works. :-)

donnlee commented 7 months ago

Ah ha! mempool.space doesn't show the size byte, so now I know I need to compare to the raw script in hex. Works great with Script.decode(foo, true) Thank you @cmdruid !

donnlee commented 7 months ago

Hmmm, I'm getting a lot of these errors now:

--- blk 831442: txid cb3a8aa2ea68dd284c4902e502a2e0bf72c721c7dc23c7c4658b84591d1e3682
Error: script decode error: Error: Varint does not match stream size: 3380 !== 3382

Wondering if https://github.com/cmdruid/tapscript/commit/ebba2f5c86b22cca12d61efb530c4b67da0a0cd7 is causing this. It may be that sometimes the script hexstring contains the size byte and other times not. Will investigate.

donnlee commented 7 months ago

ChatGPT said the size byte is encoded a different way if the size is >252

The general format for encoding the length of the witness script in hex is as follows:

If the length is 0 to 252 (0xfc in hex), it is represented directly as a single byte.

Example: If the witness script length is 42, it is encoded as 0x2a.

If the length is 253 to 65,535 (0xfd to 0xffff), it is represented as 0xfd followed by a 2-byte little-endian integer.

Example: If the witness script length is 500, it is encoded as 0xfd, 0xf4, 0x01 (little-endian representation of 500).

I see this in this txid that is throwing Varint does not match stream size: 501 !== 503: b5cadf0f746b6a899de64e21e938a519015b6f12b25d5ce9d0b37931c22da5ce

And yes, this script begins with fd:

fdf50120cf2f6edeef046f8ae6c2a6d4306dffef00b5250108d0dfda3d383fcd2c638d6cac0063036f7264010117746578742f68746d6c3b636861727365743d7574662d38004dae013c21444f43545950452068746d6c3e0a3c68746d6c206c616e673d22656e223e0a3c686561643e0a20203c6d65746120636861727365743d225554462d3822202f3e0a20203c6d657461206e616d653d2276696577706f72742220636f6e74656e743d2277696474683d6465766963652d77696474682c20696e697469616c2d7363616c653d312e3022202f3e0a20203c7469746c653e416273747261637420417274202d2041746f6d6963204d6f64656c206279206f726442616e6b73793c2f7469746c653e0a3c2f686561643e0a3c626f6479207374796c653d226d617267696e3a20307078223e0a20203c6469763e0a202020203c696672616d65207374796c653d2277696474683a313030253b206865696768743a31303076683b206d617267696e3a3070783b20626f726465723a6e6f6e653b22207372633d222f636f6e74656e742f303036636332306462356363633433353536323164623336353766653063326631303534353266313762326136323863376634613166653133373233323634356930223e3c2f696672616d653e0a20203c2f6469763e0a3c2f626f64793e0a3c2f68746d6c3e68

Does it make sense for this lib to handle this case? If not, I can check for fd in my code and slice away the first 3 bytes.

donnlee commented 7 months ago

I guess Tx.util.readWitness(input.witness).script always includes the size bytes (plural now). So at least it's deterministic and I can handle that

donnlee commented 7 months ago

and this is what chatGPT said about even larger sizes:

If the length is 65,536 to 4,294,967,295 (0x10000 to 0xffffffff), it is represented as 0xfe followed by a 4-byte little-endian integer.

Example: If the witness script length is 70,000, it is encoded as 0xfe, 0xb8, 0x1b, 0x00, 0x00 (little-endian representation of 70,000).

If the length is greater than or equal to 4,294,967,296 (0x100000000), it is represented as 0xff followed by an 8-byte little-endian integer.

Example: If the witness script length is 5,000,000,000, it is encoded as 0xff, 0x20, 0x8d, 0xe5, 0xbd, 0x00, 0x00, 0x00, 0x00 (little-endian representation of 5,000,000,000).

These encoding rules allow for a compact representation of the witness script length, adapting to the specific requirements of the length value. The encoded length is then followed by the actual witness script data in the transaction's witness field.

donnlee commented 7 months ago

Ref: https://bitcoin.stackexchange.com/questions/110808/reference-to-segwit-raw-transation-format

CompactSize: serialization of an unsigned integer in a variable number of bytes:

0 ≤ n ≤ 0xFC: serialized as [n] directly (one byte). 0xFD ≤ n ≤ 0xFFFF: serialized as [0xFD] + LE16(n) (3 bytes). 0x10000 ≤ n ≤ 0xFFFFFFFF: serialized as [0xFE] + LE32(n) (5 bytes). 0x100000000 ≤ n ≤ 0xFFFFFFFFFFFFFFFF: serialized as [0xFF] + LE64(n) (9 bytes). This isn't actually used, as no structure this big would fit in a block.

This is more for my future reference, to this github issue.

cmdruid commented 6 months ago

Are you still having parsing issues?

donnlee commented 6 months ago

Thank you for asking. I'm good after I wrote a function to remove the size bytes based on the spec (all cases of VarInt). My func is:

function removeSizeBytes(scriptAsHexString) {
  // Removes the leading size bytes from Tx.util.readWitness(input.witness).script
  // We must remove the size bytes before we .decode(script)
  // Size of the script (in bytes) is in "VarInt" format.
  // scriptAsHexString: Script in hexstring WITH LEADING SIZE BYTES.
  // https://github.com/cmdruid/tapscript/issues/33
  if (!scriptAsHexString) return

  // Examine the 1st byte (ea byte is 2 chars of a string):
  const firstByte = scriptAsHexString.slice(0,2).toLowerCase()
  // Rm first 3 bytes with: s.slice(6)
  if (firstByte === 'fd') return scriptAsHexString.slice(6)
  // Rm first 5 bytes: s.slice(10)
  if (firstByte === 'fe') return scriptAsHexString.slice(10)
  // If not 'fd' or 'fe', then the 1st byte is the length of the script. Remove it.
  // Remove the 1st byte with: s.slice(2)
  return scriptAsHexString.slice(2)
}

Any feedback on this code? I'm not a fan of slice()'ing unsafely, but the input data should be clean because it came from Tx.util.readWitness()

cmdruid commented 5 months ago

I apologize for the late reply. I am not a fan of slicing unsafely either. I see that you are looking at the first byte, which should always be a varint if you are handling hex data from the witness. I think you are safe with this assumption.

cmdruid / tapscript

Error: Size greater than stream: 38133322 > 56518 #33