Open A60AB5450353F40E opened 1 year ago
I made a little tool to experiment with this, using these modes:
Example of patternizing AnyHedge input script:
STRIP_PUSHES: 56; len=1
STRIP_PUSH_DATA: 01406001406051025401; len=10
EXTRACT_PUSHES: 40da2963cc172e7dccf9570ebd272c496d9df459f1b4d07f1961515642054db764f25c4aab947a4dbcf7793ca25bcc5a46faa83d93e7aeaa2e23a95376e386902d10020aa262ab330100863301002b45000040c0df593545220c40b8676d56388b58715b27cc0fc5de3640dfd49aa05d00e3efc5597b9d8dd783f055b0579630157599290d547e9e3c160d2170d2c300177a72103e0aa262ac3301008733010026450000514d5401043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=508
Entering the redeem script, we get:
BYTECODE: 043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=340
STRIP_PUSHES: 5d79519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695176517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=156
STRIP_PUSH_DATA: 54545352535501210119011951012101215179519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695276517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=173
EXTRACT_PUSHES: 043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c005c5b515c51515f5f575d5d575c5854005c58545c545b5b5c0059585c545c5b5a5b0222025853575252000055515154; len=231
To better illustrate, here's an index (pattern, input_count) of contract fingerprints (STRIP_PUSHES mode) from blocks 0-780,000:
https://gist.github.com/A60AB5450353F40E/6b3e525d6e1220328217b9568968d6fc
Thanks for looking into this @A60AB5450353F40E!
This would be a great improvement for scanning contract patterns. I'd love to take a PR introducing this feature! I won't have bandwidth to work on this myself until I make some progress on https://github.com/bitauth/chaingraph/issues/29. (Otherwise, I'll try to implement the bytecode_pattern stuff this way when I'm working on the ClickHouse migration.)
Wrote a little paper about this: https://gitlab.com/0353F40E/smart-contract-fingerprinting
and working on extending BCHN RPC with it: https://gitlab.com/0353F40E/bitcoin-cash-node/-/commits/bcpattern
If we could get this merged to BCHN then you won't have to calculate it with SQL anymore, can just read the relevant fields from node RPC
So I was thinking to split how to store redeem script, how about split it to 3 fields:
One can then use _eq operator on the most general pattern, which should be better performance than regex, it would then be further narrowed down by using regex on push sizes or pushes, but those would be executed only on positive matches for the general template. Also, the redeem script can be accurately reconstructed from this.
Could even do some more parsing and have a function to filter for the exact value of Nth push or something.