cculianu / Fulcrum

A fast & nimble SPV Server for BCH, BTC, and LTC
Other
341 stars 77 forks source link

Implementing blockchain.outpoint.subscribe method #104

Open Pantamis opened 2 years ago

Pantamis commented 2 years ago

Hi,

I can't start without saying that your work on Fulcrum is impressive and deserves much more attention. My experience with it is so much better than ElectrumX both during sync and once synced ! It is a really impressive work !

I would like to know how hard it would be to implement blockchain.outpoint.subscribe method from Electrum 1.5 in Fulcrum. The main application would be to show the spending transaction of an outpoint in a block explorer. You can look at my implementation at janoside/btc-rpc-explorer#356.

Implementing blockchain.outpoint.subscribe is straightforward in ElectRS because it builds a transaction inputs' index as explained here. Such implementation in ElectRS is available at romanz/electrs#454. If I understand well which indexes Fulcrum build, there is no such index so it won't be that easy. I don't wish to take the path of building a new index of spending transactions because building the indexes is a sufficiently painful process as it is already. That's said I would gladly try to sync and use a version of Fulcrum with such index if you implement it in the future.

However, Fulcrum builds already a lot more indexes than ElectRS and I think it is possible to have a quite efficient solution without building a new one. My idea is to use the scripthash of the outpoint.

  1. Using scripthash_history index, Fulcrum will get all the txnum of the scripthash of the outpoint.
  2. Using txhash2txnum, it can also get the txnum of the funding transaction. Now we use the fact that in BTC blockchain, transactions in the same block are ordered in topological order, so we know that the txnum of the spending transaction MUST be greater than the txnum of the funding transaction. I guess you want to keep Fulcrum fully compatible with BCH, I don't follow BCH protocol change but I know it used CTOR for a while, so in this case the check must be relaxed a bit: the spending transaction has a txnum higher than the first txnum of the block including the funding transaction (it is a rather small performance loss in the BTC case to keep everything compatible).
  3. Fulcrum load every transactions returned by the scripthash_history index that have a greater txnum than the funding transaction and check if they spend the outpoint of the query. If it finds the spending transaction, returns it.

This is clearly suboptimal solution. It can also be a terrible DoS vector if we query the spending transaction of an outpoint with an highly used scripthash that only spend it after many other transactions involving this scripthash. I think if the outpoint is not spent we can use the scripthash_unspent index to detect it and avoid the almost exhaustive search among all transactions involving a scripthash. That said, I would like to try and see if it is more efficient than ElectRS which must always load full blocks to answer each query of an outpoint, it can be terribly slow if we want to look at all the spending transactions of the 4000 outpoints of a transaction like d22c53dd5a79fb6cc7a10ed6531761ae43ce1ce2ecd1ebbc5746ba82c71e6d70

I would like to have your opinion on this idea, do you think it is worth to try ? Do you have a better idea ?

Thanks again for the amazing work, this is an truely amazing software !

cculianu commented 2 years ago

Hmm.. I have to think about this.

Just a few points:

I still have to think about this more but a potential way to do it would be the DoSey implementation ideas you suggest and just have it as a "WARNING DO NOT USE THIS ON PUBLIC SERVERS". This way it could just be data served up to a block explorer for a Fulcrum server that is not public. Or.. somesuch.

I will need to continue thinking about this. Obviously if Electrum protocol 1.5 does end up being adopted widely enough, Fulcrum will have to implement this eventually anyway...

Pantamis commented 2 years ago

Thanks for your insights !

About the CTOR issue: we can still use the fact that the spending transaction txnum will be greater than the txnum of the first transaction in the block including the funding transaction (which will be less than the txum of funding transaction but quite close). So this "hacky" trick to guess where the spending transaction is can still be used (but slightly less effective). That said, if you are fine with BTC-specific stuff, I would implement the most optimal for BTC if I have the time and motivation (I am not good at any other programming language than Python but I can be very persistent X-D ). I think the original use case of this method is to improve LN management in Electrum so I don't see much usage outside of BTC anyway.

I don't think we need the spk argument, we should be able to retrieve the scriptPubKey by loading the funding transaction directly (this argument is only needed by personnal servers that only index some addresses on the fly). I agree with you that this is really a "quick and dirty" implementation of the method anyway. This is clearly something to use only for a private instance of Fulcrum (or small community). The main use case is a Raspberry Pi node with a personal block explorer. Having the spending transaction is quite nice to do "home made chainalysis" of its own funds :p

Let me know if (and when) you plan to work on this ! I would be very happy to test a PR for you with my blockexplorer :) loading 4000 outpoints at the same time will be a good test of the performances :p !

cculianu commented 2 years ago

Thanks for your insights and thoughts and offer to help. I definitely will ask you to test this when I implement it! I may get some free time in a few weeks to spend time on it.

I am just tempted to go the "use a real index for this" route since I am a performance nazi and the fact that it could potentially be slow irks me. Compared to the size of the BTC blockchain for instance, another ~10G index isn't THAT BAD. But then again, that's on top of an already high 80G cost or whatever for Fulcrum db against a BTC blockchain. Hmm.

Hmm..f the usecase is "personal use", the hit won't be too bad to do the slower lookups without a real index. And each index we add does eat into sync time performance as it's one extra disk write that needs to happen. Hmm. I'll ponder this some more. I would definitely start out by testing either approach and seeing what the real tradeoffs are before I make a final decision. I'm super undecided until then....

Right, I can see how this is more useful for LN. I definitely would initially lean on the "optional feature" in either case and if push comes to shove, BTC-specific is fine. On BCH we don't have LN (although we could technically implement it if we needed it, but tx volume is low relative to block size... as are fees... so we haven't really done so...).

Hmm. I have to ponder the design and try out a few ideas. I'll let you know if/when I seriously start to work on it, for sure.

Pantamis commented 2 years ago

We are on the same page :), I also think that 10G added in indexing wouldn't be so bad (in ElectrumX I would be crying because the initial syncing is so bad and crashing so often but Fulcrum sync is much more reliable...). The current Fulcrum DB is 100GB in size for BTC on my node, 10GB more is not much too me. But I also think it is better to unset this index by default at least to start, because it is quite "application specific".

If you go to the index solution I think it would be possible to leverage the scripthash index much more efficiently than what is possible with ElectRS. An idea I have is the following:

  1. Check if the outpoint is spent (UTXO check), if not DONE, else continue
  2. Fulcrum looks at the outpoint's scripthash and loads the txnum of all transactions involving it using the scripthash_history index
  3. An outpoint_num index is built and store for an outpoint (stored as a couple (txnum, output_index) ) how many transactions after the funding transaction there are in the list of transaction returned by the scripthash_history index before hitting the spending transaction
  4. Using this index, Flucrum directly goes to the txnum of the spending transaction in the list of transaction of this scripthash. Return its hash. DONE

Why using scripthash_history and not directly storing the txnum of the spending transaction ? Because we expect the number of transactions of this scripthash between the funding and the spending to be very low. This means we don't have to store big number and it is possible to set a very small size for the output type of the outpoint_num index (one or 2 bytes maybe), we use the fact that there is only one spending transaction to avoid storing the same txnum twice. Each rows stored is then 6 bytes (funding TxNum) + 2 bytes (output_index, necessarly less than 65000 because of the blocksize limit) + 2 bytes (numbers of transaction between spending and funding, output of the index). It can be made BCH compatible by using some bits to signal negative value, in case the spending transaction is before the funding in the BCH blockchain.

I think it even possible to reduce even more the size of the outpoint index: when an outpoint is spend by the very next transaction involving its scripthash, we could just .... not store anything. Then Fulcrum knows that the next txnum in the list returned by scripthash_history is the spending transaction (we already tested that it was not a UTXO). And this will be the overwhelming majority of the case. So really, the outpoint index can be very VERY small.

That's say it is still possible that a scripthash is spammed with dust so maybe we would have to add some exceptions to this schema ?

Anyway, that's quickly how I see it and I think it leverages already build indexes very efficiently. I am very glad you will work on the index path haha :)