AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 70 forks source link

Allow max-transaction-time to be set to relative (negative) value to chain consensus #1754

Open matthewdarwin opened 11 months ago

matthewdarwin commented 11 months ago

Allow (an equivalent option to) max-transaction-time to be set to relative (negative) value to chain consnensus. Reason is discussed in Antelope Developers telegram:

Matthew Darwin, [10/11/23 6:20 P.M.] Currently we have max-transaction-time set to a higher value on the BP node than any other node. This setting is to prevent BP from getting into a situation where the relay node passes transactions on the boundary of the CPU limit and when the BP runs the transactions they take a bit longer and then you end up with empty blocks. With the suggestion to remove max-transaction-time what is the way to prevent this abuse vector?

Kevin Heifner, [10/11/23 6:25 P.M.] As a BP operator you may still wish to set a non-default smaller value on your relay node to your BP to prevent this. Or use your fastest hardware for your BP nodes. Setting it to larger than on-chain value allows all other nodes in the network to propagate trxs without hitting a less than on-chain max value.

Also the other subjective mitigations (3-strike rule, subjective cpu billing) should continue to provide protection.

Matthew Darwin, [10/11/23 6:27 P.M.] Ok, so keep like now, just increase the limits.

Kevin Heifner, [10/11/23 6:29 P.M.] The idea here is to use the on-chain consensus values to control network wide max trx limits. BPs can still fine tune configurations as needed. Feedback on if that fine tuning is actually needed will be welcome.

Matthew Darwin, [10/11/23 6:30 P.M.] Well the node will need adjusting manually since I can't set it to the "consensus value - 10ms"

Kevin Heifner, [10/11/23 6:32 P.M.] True, but much easier for BPs to manually tune than the whole network to change. A consensus value - x config option could be explored. We are hopeful that the other subjective mitigation will make that not necessary.

Matthew Darwin, [10/11/23 6:33 P.M.] One would hope, but people are creative what they try.

heifner commented 11 months ago

Additional comentary from @arhag

Areg Hayrapetian, [10/11/2023 9:24 PM] It should probably allow a positive value as well (and maybe a relative difference is preferred over an absolute difference).

A positive relative difference makes more sense to allow more margin on the BP node than the value the rest of the network is expected to use. That way only the actual block producing nodes need to be configured differently and all of the rest of the nodes along the path can just use the on-chain value.

That may mean adjusting the on-chain value down a bit to give a little bit of room for the margin on the BP node if 150 ms is considered to be the absolute max a BP is willing to give to a transaction.

Areg Hayrapetian, [10/11/2023 9:31 PM] Note that if this new option specifies a positive difference, it would be considered differently than just having the equivalent max-transaction-time.

It would tolerate going beyond the on-chain value when it comes to wall-clock deadline while still respecting the on-chain maximum in terms of what can be billed. Effectively, when specified as a positive relative value, it is compensation to the BP clock to make it seem like the CPU is running faster than it actually is when it comes to determining whether it meets the threshold of whether to accept or reject the transaction, but it can still use the real clock (though limited by the max on-chain value) in terms of what it is allowed to bill.

Areg Hayrapetian, [10/11/2023 9:53 PM] And actually, now that I am thinking about it, I think at some point we need to re-think the role of the on-chain max transaction time (max_transaction_cpu_usage).

It is currently serving two roles.

One is as a suggestion to all nodes what the recommended wall-clock enforcement time should be for any given transaction when evaluating whether to include it in a block (as a BP) or whether to pass it along in the p2p network (if not a BP). This is just a recommendation since this is subjective to each node and cannot be objectively judged by third-parties; additionally, the official nodeos provides way to bypass the recommendation in limited ways, and of course anyone can tweak the code to bypass it arbitrarily.

The second role is as a maximum value of CPU that can be charged per transaction. The BP is acting as an oracle of key metrics that should drive the CPU bill. Specifically, they measure what they believe the computation time will be for most nodes in the network and assign a value to the CPU bill so that the transaction payer is restrained by the cost, thus preventing them from cheaply forcing all nodes in the network to spend time processing transactions (assuming CPU was priced correctly). But since CPU is subjective, the mapping of the subjectively measured wall-clock time for processing a transaction to the number of microseconds put for the CPU in the transaction receipt is a choice made by each BP. Granted it is a choice made by the official nodeos without any options currently to tweak that mapping; but in theory each BP could run custom software that changes that mapping. There is no objective validation of the mapping possible since it is dependent on a subjectively measured input. The only objective validation possible on the CPU amount in the transaction receipt it is that it is within the bounds of the minimum allowed value and maximum allowed value currently configured on chain.

But there may be a desire to change that mapping in the future. For example, maybe we want to convert NET from bytes to microseconds with some ratio and add it to the wall-clock time. That could allow simplifying the resource model a little bit by removing NET and only having developers and users worry about two resources: CPU and RAM. Or we may want to prevent abuse of history solutions by adding a fixed cost per each inline action (or a dynamic cost that depends on bytes in the payload of each inline action) to the CPU. In this case, CPU further deviates from its original intention of just capturing the subjectively measured wall-clock time it took to process a transaction, and instead it becomes a subjectively defined "gas usage" of the transaction. The changes to how CPU is billed that I described are all possible to make (and adapt as needed depending on changing conditions) without a hard-fork. But it comes with the limitation that the total billed CPU amount (in microseconds) cannot exceed the value of max_transaction_cpu_usage on chain.

So I think those roles will need to eventually be decoupled.

Areg Hayrapetian, [10/11/2023 9:53 PM] There are actually quite a few parameters I can think of where there is coordination value in putting it on-chain but where it's value is not actually necessary for blockchain validation. I will refer to this class of parameters as "on-chain recommendations". So it does not actually need to be part of the Antelope protocol (at least the part of the part of the protocol dealing with blockchain validation). But it may be good to define as part of some higher-level standard within the larger Antelope protocol since it will be relevant to all node implementations of the Antelope protocol (e.g. Leap) and may even be critical to the proper functioning of the peer-to-peer network of an Antelope blockchain.

One example, is the max wall-clock time to enforce when speculatively evaluating a transaction to determine whether to relay or drop. Another could be that ratio between NET bytes and the microseconds to add to CPU. Another could be the cost (in microseconds) to add per KiBs of payload in inline actions. Even if the calculation of the final bill committed into the transaction receipt is subjectively determined by the block producer, it may be a convention of the blockchain network for all honest block producers to follow a particular algorithm that takes as input: the subjectively measured wall-clock time for processing the transaction; the objectively determined computations done by the transaction execution; the parameters stored as on-chain recommendations.

heifner commented 11 months ago

Related: https://github.com/AntelopeIO/leap/issues/1334

bhazzard commented 11 months ago

We'll wait to decide next steps until we have more information from real world usage once 5.0.0 has adoption on main nets.