algorand / go-algorand

Algorand's official implementation in Go.
https://developer.algorand.org/
Other
1.35k stars 470 forks source link

Multiple-block opcode computation? "AVM Snapshots" #5570

Open HashMapsData2Value opened 1 year ago

HashMapsData2Value commented 1 year ago

Problem

With the following EC Math PR Algorand will be getting its biggest opcodes yet. Unfortunately the pairing check opcode is so large it'll be limited to logicsig.

However there are surely even bigger opcodes that would be very useful for Algorand to be able to provide. For example the ability to verify ZKPs beyond Groth16, e.g. non-trusted ZK-SNARKs ones like Plonk, or something different like Bulletproofs.

Solution

Would it be possible to make use of the existing BoxStorage functionality to introduce bigger opcodes that are a lot bigger than the maximum opcode budget limit but can save intermediate steps to the smart contract?

You could imagine a big opcode that requires a total opcode budget of 10 blocks. The caller would need to call the the smart contract ten times, each time pulling in stored state, doing the computation it has time for and then storing back to the smart contract.

I imagine there would be two approaches to the problem:

  1. Create a general way of taking a "snapshot" of Algorand virtual machine state and whatever variables are in memory at the time.
  2. Tailor new, large, opcodes to be "stretchable".

The first solution would make Algorand smart contracts incredibly versatile, but I suspect probably difficult since the AVM makes use of forks of different libraries that probably handle memory storage in different ways. With the second solution you'd ensure it works on an opcode level at least.

Dependencies

N/A

Urgency

Low, until Algorand starts getting serious competition from chains that make liberal use of ZKP.

joe-p commented 1 year ago

The work being done in #5528 seems like a more elegant solution for supporting larger opcodes. I suppose the main question would be why you want these opcodes to be available directly in stateful applications rather than simply using atomic logic sigs?

HashMapsData2Value commented 1 year ago

Because there are applications where you need to maintain state across time?

jannotti commented 1 year ago

5528 will certainly help, and will come out alongside #4924 in order to make those EC Math opcodes useful.

It's certainly possible that someone would need much more time than even that will allow, though. Speaking very approximately, an opcode tick is about 15ns. With 16 logic sigs, you can have 320,000 such ticks, so about 32015 µs ~ 5ms of compute. (With apps, you can get 272700 ticks ~ 180,000 ~ 2.7ms but it will cost a lot more for txn fees).

Anyway, suppose you don't need 5ms of compute time, you need 500ms of compute time. If a real use case cropped up with that kind of demand, I think it would be doable but only for a few transactions per block (of course). The key is allowing the nodes to perform the computation between blocks, when there is already slack time as the network portion of the protocol is passing messages around and waiting for timeouts.

So I would propose something like: A transaction can prepare a special kind of inner transaction that is going to run statelessly in between blocks. The result gets recorded (somehow) in the next block".

By statelessly, I mean the same kinds of restrictions that apply to LogicSigs - the initiating transaction would have to bundle up all the state it needs, so that execution of this special inner transaction can be performed without referring to the state of the blockchain (which is sort of moving underneath it, as the next blocks are being prepared). This property means we can executed these things concurrently during the slack time, even while preparing the next block.

I suppose these transactions would have be prepaid by the initiator, and we need someway to record their result to the chain in a way that can't conflict with anything that has happened in the meantime. For example, I have considered the idea that the result is given back as a sort of "callback" to the app that initiated the transaction. But what if the app gets deleted in the first txn of the next block?

I don't think it's insurmountable, but it's a lot of work, and we'll end up with a slightly unusual programming model. We'd need to have a sure use case that cared enough that we'd expect they'd really work through the challenge.

PS: An AVM execution can be grow pretty large during execution. The stack alone can hold 4mb. So they would be hard to checkpoint into boxes.

HashMapsData2Value commented 1 year ago

I'm wondering if there are any unintended negative side-effects of having opcode computation running in between blocks, particularly with nodes coming on and off? Though I understand it would make optimal use of node hardware.

I was thinking it could be done in increments. A particularly beefy opcode might be set at 10 increments. So you first initialize it with a start tx, including all of the input data it needs as you said to ensure it is stateless across its computation lifecycle. Then, in 9 blocks (consecutive or non-consecutive, in case block demand shoots up) you pay to have the opcode computation progress another step, then another, until finally the opcode has completed and the computation reveals a result.

Of course I imagine it would require a lot more work in ensuring that the opcodes themselves are "checkpointable" on a function level. Each opcode would need to be handcrafted.