bitjson / bch-vm-limits

Retargets limits to make Bitcoin Cash contracts more useful and reduce compute requirements for nodes.
4 stars 4 forks source link

Density-based limits #8

Closed bitjson closed 1 month ago

bitjson commented 7 months ago

Left this out of my initial 2025 cleanup commit: I think we should also consider a more conservative approach of limiting hash digests by the size of the containing transaction. So hashing would have a density-based limit like SigChecks. (I'd also like to see SigChecks simplified to make validation more stateless – any set of valid transactions should make a valid block, but currently, there are block-level SigChecks limits.)

Context: I'd like to see a lot of mobile wallets internally running pruned nodes for the best privacy and security, and it would be nice to ensure TX validation requirements remain inconsequential vs. bandwidth requirements. So even if the limits as initially proposed are safe for the network itself, more careful targeting can still help to reduce worst-case power usage on mobile devices.


For standard transactions, there's little reason a transaction should ever use more than 1 hash digest iteration per 32 bytes (i.e. double-hashing every hash input, and every hash input is shorter than one 64-byte message block, so no efficiency gains from longer messages). Even if the hashed data is provided by the locking script/UTXO, because the pattern would necessarily be P2SH, the data must still be pushed in the transaction itself as redeem bytecode.

For nonstandard transactions, we would set the limit at ~2 digest iterations per byte, the density that is currently possible with today's limits. This would allow experimentation past the standardness limit for transaction relay and also ensure that any currently spendable UTXOs remain spendable indefinitely. (Aside, I think the 2020 SigChecks changes may have theoretical violated this, so on principle we should probably target slightly higher nonstandard signature operation limits to remedy that: https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/issues/109. I'll mention that OP_CODESEPARATOR would be a problem without sigchecks, and OP_CODESEPARATOR remains useful in some scenarios for allowing signatures to efficiently commit to an executed path within a transaction.)

So VM implementations would track both digestIterations and signatureChecks over the course of the evaluation, failing the evaluation if they exceed the fixed number allocated based on the transaction's size. When validating a full transaction, the value after each input's evaluation can be passed in as the initial value to the next input's evaluation to minimize total computation; validation can stop as soon as a limit is exhausted, no need to wait for all inputs to be evaluated and the transaction level metrics to be aggregated. (Note that parallelization of input validation within a transaction is somewhat possible but low-value, as double spending must still be prevented between inputs in the same transaction. It's more useful to parallelize validation at the transaction level, where the same coordination has to happen to prevent double spending too.)

tomFlowee commented 7 months ago

I'd also like to see SigChecks simplified to make validation more stateless

This sounds good.

My main gripe is that the validation is impossible to do anywhere other than on a full node. So a point of sale machine is incapable of checking if the transaction has excessive limits which may stop it from being mined by conservative miners. Which raises the double spend risk.

(to clarify this point, sigchecks requires a (nearly) full VM and it requires the full output script matching each of the inputs of the transaction since it basically needs to run through the program (script) of each of those).

This raises a question:

Context: I'd like to see a lot of mobile wallets internally running pruned nodes for the best privacy and security,

What do you mean with "pruned node" ?

In full node software that is implied to be nodes that download and verify ALL transactions that are mined in a block. They just don't keep all of them.

It should be pointed out that if this is what you mean, then the idea of a mobile wallet doing that is not very realistic. No, let me be blunt, this is an utterly impossible wish in a world where BitcoinCash is successful.

Mobile wallets should SPV validate, though. But that is not relevant to the topic here as validating the actual script sounds like something that a mobile has no benefit doing.

bitjson commented 7 months ago

This sounds good. [...] My main gripe is that the validation is impossible to do anywhere other than on a full node. So a point of sale machine is incapable of checking if the transaction has excessive limits which may stop it from being mined by conservative miners. Which raises the double spend risk.

(to clarify this point, sigchecks requires a (nearly) full VM and it requires the full output script matching each of the inputs of the transaction since it basically needs to run through the program (script) of each of those).

I'm not sure if I follow – how else can a payee verify an unconfirmed transaction they've received?

Either they have a trusted full node that relays them the transaction (in which case, the full node did the unconfirmed validation, and they just need to check that it pays them the expected amount/tokens), or they need to run BCH's TX validation function with all standardness + consensus rules (including evaluating all inputs) + the UTXO information (though that + SPV proofs can be passed from the payee over whatever payment protocol is in use). Is there another use case you're thinking about here?

What do you mean with "pruned node" ? [...] It should be pointed out that if this is what you mean, then the idea of a mobile wallet doing that is not very realistic.

I don't want to get too far afield of this issues topic, just wanted to provide "pruned nodes on mobile" as an example of why we would choose to be as conservative as possible on computational limits – we want limits high enough to support all theoretical contract use cases, but no higher (even if the difference seems trivial).

On the example: I'd agree that for now most mobile wallet users won't use pruned nodes by default, but I expect that a lot of great consumer mobile and desktop wallets will eventually allow you to enable it as a privacy/security feature. From a users perspective, it just requires committing some storage (less than 10 GB currently) and bandwidth while active (~55kb/s with full blocks currently, catch up costs may even be reduced by UTXO commitments), and it would allow the device to trustlessly and privately interact with any on-chain contract system or sidechain. (In fact, BCH is one of the only token and contract-supporting cryptocurrency networks in which these sort of lightweight-but-complete mobile clients are possible.)

If consumer-grade desktop hardware continues to be orders of magnitude beyond capable of validating the chain, there's no reason that high-end mobile device will have trouble either. And block size limit growth will be gradual, so unless mobile devices stop getting better, I don't see any particular cutoff at which this expectation should break.

tomFlowee commented 7 months ago

how else can a payee verify an unconfirmed transaction they've received?

Not sure if I'm following you now, what do you think happens on point of sale devices, on thin wallets and essentially anything bitcoin-like that is not a full node?

What kind of verification do you think is going on?

The answer is that they receive the transaction FROM a random full node via the INV/getData method. Or via fulcrum. And that is enough. Absolutely no thin wallet fetches the outputs that are being spent by a transaction that pays them.

just wanted to provide "pruned nodes on mobile" as an example of why we would choose to be as conservative as possible on computational limits

ok, but that is a usecase that does not exist and has no business existing. It is basically impossible to do what you suggest we optimize for.

I would suggest reasoning about the limits to ONLY in the context of full nodes. Simply because they are the only ones that will run a VM.

(In fact, BCH is one of the only token and contract-supporting cryptocurrency networks in which these sort of lightweight-but-complete mobile clients are possible.)

Also on BCH this is not possible. It's not a CPU limit, mind you. It's a data (and speed of access) limit.

bitjson commented 7 months ago

Ok, I think I understand where you're coming from. I can understand that some applications would be willing to assume that their P2P network peers are non-malicious (as broken/malicious peers get punished with bans over time) and well-known Fulcrum endpoints are run by operators who are trusted by the community to not return fraudulent transaction data.

Absolutely no thin wallet fetches the outputs that are being spent by a transaction that pays them.

For the wallets and point-of-sale applications I've used or been a part of writing (at BitPay and Bitauth), the trust model is more definitive: either the application fetches UTXOs from an explicitly user-trusted peer, or it validates the payment transactions itself.

That's not to say that the more stochastic trust model above wouldn't work most of the time, but I've never worked with any businesses that are relying on that model.

I would suggest reasoning about the limits to ONLY in the context of full nodes. Simply because they are the only ones that will run a VM. [...] Also on BCH this is not possible. It's not a CPU limit, mind you. It's a data (and speed of access) limit.

I think this anticipates a theoretical future with multi-GB blocks. As things currently stand, it's trivial to run a VM with input data from a trusted source in JavaScript, in-memory, on a web page, many thousands of times per second. With more development, this could be a pruned node too.

I agree in principle though that developers should be careful to design applications that can continue to work even if average BCH block sizes annually double for 10+ years and the UTXO set significantly grows. (Of course, if consumer computing continues to improve at current rates, we'll still be able to run pruned nodes in-browser then too, even without big leaps in UTXO recall via ZKPs or otherwise.)

tomFlowee commented 7 months ago

I can understand that some applications would be willing to assume that their P2P network peers are non-malicious

That's a bit of an irrelevant tangent. Still gives me the impression you have a view of the world in your head that doesn't actually exist.

For the wallets and point-of-sale applications I've used or been a part of writing { } either the application fetches UTXOs from an explicitly user-trusted peer, or it validates the payment transactions itself.

In earlier posts here you implied that this point of sale application would validate the scripts, specifically running them through a VM to see if they parse.
To do that, you need the UTXOs it spends, indeed. Which implies a trusted node, you are correct. Can't validate those scripts without that output.

What is the point of using a trusted node if you don't actually trust it to validate the transaction for you?

So I'm hoping you are mis-remembering, but this setup is non-nonsensical.

That's not to say that the more stochastic trust model above wouldn't work most of the time, but I've never worked with any businesses that are relying on that model.

Well, sorry to be blunt one more time. But literally everyone in the world is doing this. Nobody is even pretending to do what you propose to do.

Everyone leaves the parsing and validation of bitcoin-scripts to full nodes. Please do go and check your facts, this one is really not controversial.

As things currently stand, it's trivial to run a VM with input data from a trusted source in JavaScript, in-memory, on a web page, many thousands of times per second. With more development, this could be a pruned node too.

You are wrong. You are looking at the wrong thing. The CPU cost is not the issue, the bandwidth and data storage is the issue.

And I refer to my first reply, you still seem confused about what a pruned node is. I repeat, a pruned node processes 100% of the daily created blockchain. Pruning the data after downloading and processing is irrelevant to the scaling issues. You haven't address that.

I'll unsubscribe from this issue as I've said all that needs to be said.

bitjson commented 7 months ago

What is the point of using a trusted node if you don't actually trust it to validate the transaction for you?

Agreed: either 1) you 100% trust a node to not defraud you about the validity of unconfirmed transactions, or 2) you need to 100% validate the unconfirmed transaction yourself, up to and including VM evaluations. So far as I can tell, there is no "1.5" in which you're A) not 100% trusting a node and B) not validating unconfirmed transactions with a fully functional VM.

A "1.5" model would be security theater – if you don't 100% trust the peer from whom you're receiving the unconfirmed transaction (e.g. you have no business relationship or legal recourse, they're just a randomly-selected, anonymous full node from the network), an attacker can double-spend by getting their own node into that trusted position.

Lots of apps and self-custodial wallets use trust model (1) for unconfirmed transactions (even if they otherwise don't trust the service with their keys), but more advanced wallets and full node-based wallets use trust model (2).

just wanted to provide "pruned nodes on mobile" as an example of why we would choose to be as conservative as possible on computational limits

ok, but that is a usecase that does not exist and has no business existing. It is basically impossible to do what you suggest we optimize for.

If I understand you correctly, you expect that no power-usage-sensitive device can ever use trust model (2) because bandwidth and data storage requirements are too high. I might agree with that assessment for the BSV network, but BCH has a relatively low block size limit, and it is currently – and will remain (even with the adaptive limit) – very reasonable for low-powered, moderate-bandwidth consumer devices to run pruned nodes.

To make it as concrete as possible: I would like a JS-based wallet running on an iPhone to have a "pruned node" setting – when activated, the wallet will somehow download and verify the UTXO set (pruned full sync, UTXO commitments, a ZKP strategy, etc.) and then listen, live, to 100% of transactions on the network (maybe only while on wifi 😄). This would allow the wallet to fully participate in on-chain covenant systems or sidechains with minimum trust and/or information leakage.

Today, this may require up to ~5GB of bandwidth per day (at 32 MB blocks, depending on peer banning and mempool eviction policies), but realistically, it required only ~8GB of bandwidth in all of 2023. Likewise, storage requirements may rise at nearly the same rate as bandwidth (tempered by fees and the dust "rent" of creating new UTXOs), but the real UTXO set storage requirement is still well under 10GB. (I also think further technical improvements will reduce these requirements, e.g. rather than naively listening to all traffic, future clients may be able to receive relevant transactions over direct or filtered channels + per-block UTXO set diffs with zero knowledge proofs.)

So:

it would be nice to ensure TX validation requirements remain inconsequential vs. bandwidth requirements. So even if the limits as initially proposed are safe for the network itself, more careful targeting can still help to reduce worst-case power usage on mobile devices.

Alternatively, the requirements here could be framed as "minimize worst-case power usage on fully-validating nodes" as a second priority, while the top priority remains, "ensure all real use cases remain maximally byte-efficient" (i.e. never need to be padded with extra data to meet excessively low density-based limits).

Minimize data usage first, and from there, minimize worst-case power usage.

bitjson commented 7 months ago

Latest update: https://bitcoincashresearch.org/t/chip-2021-05-targeted-virtual-machine-limits/437/31?u=bitjson

I'll take a swing at these density-based limits over the next few months.

bitjson commented 6 months ago

TODO: review whether the stack memory limit should be density based as well, include a rationale section.

bitjson commented 1 month ago

Cleaning up issues now that we have two implementations of the latest proposal. 🚀

All limits are now density based, so I'm going to mark this issue as resolved. Please comment if I should re-open!