Simple Subroutines for the EVM

gcolvin commented 5 years ago

eip: 2315 title: Simple Subroutines for the EVM description: Introduces two opcodes to support simple subroutines status: Draft type: Standards Track category: Core author: Greg Colvin (@gcolvin), Greg Colvin greg@colvin.org, Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede) discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 created: 2019-10-17 requires: 3540, 3670, 3779, 4200

Abstract

This proposal introduces two opcodes to support simple subroutines: RJUMPSUB and RETURNSUB.

Taken together with other recent propoposals it provides an efficient, static, and safe control-flow facility.

Motivation

The EVM does not provide subroutines as primitives.

Instead, calls can be synthesized by fetching and pushing the return address and subroutine address on the data stack and executing a JUMP to the subroutine; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions cost gas and use stack slots unnecessarily. And they create unnecessary complexity that is borne by the humans and programs writing, reading, and analyzing EVM code.

Subroutines

As an alternative, we propose to provide for subroutines with two simple operations:

RJUMPSUB to call a subroutine and
RETURNSUB to return from it.

Taken together with other required propoposals they provide an efficient, static, complete, and safe control-flow facility.

Efficient. Substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines -- as much as 56% savings in gas in the analysis below. Substantial performance advantages for static jumps are reported elsewhere.

Static. All possible jumps are known at contract creation time.

Complete. Together with EIP-4200 it provides a minimal set of control structures for implementing programs -- jumps, conditional jumps, and subroutines.

Safe. Valid programs will not halt with an exception unless they run out of gas or recursively overflow stack.

Prior Art

Facilities to directly support subroutines are provided by all but one of the real and virtual machines we have programmed. This includes physical machines like the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, Sun SPARC, ARM, and a few generations of Intel x86; as well as virtual machines for Lisp, Pascal, Java, Wasm, and the sole exception -- the EVM. The details and complexity vary, but in whatever form these facilities provide for

capturing the current context of execution,
transferring control to a new context, and
returning to the original context
- after possible further transfers of control
- to some arbitrary depth.

Over the years, for the machines we have programmed, native subroutine operations have proven their value for efficient implementation and simple coding of subroutines.

Original Design

The first specification of subroutines for a real machine goes back to Turing's Automatic Computing Engine of 1946:

We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation. ... When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note [on] a list of these notes in one or more standard size delay lines, (1024) with the most recent last.

Turings's machine held data in delay lines made of mercury-filled crystals. We have better hardware now, but the concept is simple and by now familiar. In less archaic language, Turing's full specification is that:

The "subsidiary operations" are subroutines.
The "list of notes" is a stack of return addresses.
When a subroutine is called
- the current program address is pushed on the stack, and
- control transfers to "apply the first instruction of the subsidiary" at the subroutine address.
When a subroutine returns
- the return address is popped off of the stack and
- control transfers to the instruction at the return address.

This design still sees use by virtual machines for Lisp, Forth, Java, Wasm and others.

Specification

We propose to follow Turing's original specification, as applied to the EVM architecture. We support calls and returns with a return stack of return addresses. The EVMreturn stack, like the EVM data stack, is limited to 1024 items. This return stack is accessed via two new subroutine instructions.

Instructions

`RJUMPSUB (0x5e) jmpdest`

Transfers control to a subroutine. The destination address is relative to the current PC.

Decode the jmpdest from the immediate data.

The data is encoded as a two-byte, twos-complement signed integer, stored MSB-first.

Push PC + 1 on the return stack.

Set PC to PC + jmpdest.

The cost is low.

`RETURNSUB (0x5f)`

Returns control to the caller of a subroutine.

Pop PC off the return stack.

The cost is verylow.

Notes:

If a resulting PC to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.
Values popped off the return stack do not need to be validated, since they are alterable only by RJUMPSUB and RETURNSUB.
The description above lays out the semantics of these instructions in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code RJUMPSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 1 location.)
The return stack is the functional equivalent of Turing's "delay line".

The low cost of RJUMPSUB versus JUMP is justified by needing only to add the immediate two byte destination to the PC and push the return address on the return stack, all using native arithmetric, versus using the data stack with emulated 256-bit instructions.

The verylow cost of RETURNSUB is justified by needing only to pop the return stack into the PC. Benchmarking will be needed to tell if the costs are well-balanced.

Safety

We define safety here as avoiding exceptional halting states, as defined in the Yellow Paper. A validator can always detect three of these states at validation time:

Insufficient stack items
Invalid jump destination
Invalid instruction

A validator can detect stack overflow only for non-recursive programs, so two states will still require tests at runtime:

Stack overflow
Insufficient gas

EIP-3779: Safer Control Flow for the EVM specifies initialization-time validation to detect invalid contracts.

Rationale

Below we explore the advantages of have a subroutines as primitives, design alternatives, and the reasons for our choice of specification.

Efficiency Analysis

We show here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using JUMP.

Simple Subroutine Call

Consider this example of calling a fairly minimal subroutine.

Subroutine call, using JUMP:

TEST_SQUARE:
    jumpdest        ; 1 gas
    RTN_SQUARE      ; 3 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas
RTN_SQUARE:
    jumpdest        ; 1 gas
    swap1           ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 50 gas

Subroutine call, using RJUMPSUB:

TEST_SQUARE:
    0x02            ; 3 gas
    rjumpsub SQUARE ; 5 gas
    returnsub       ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total 22 gas.

Using RJUMPSUB versus JUMP saves 50 - 22 = 28 gas -- a 56% improvement.

Tail Call Optimization

Of course in cases like this one we can optimize the tail call, so that the return from SQUARE actually returns from TEST_SQUARE.

Tail call optimization, using JUMP:

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 33 gas

Tail call optimization, using RJUMP and RETURNSUB:

TEST_SQUARE:
    0x02            ; 3 gas
    rjump SQUARE    ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total: 17 gas

Using RJUMPSUB versus JUMP saves 33 - 17 = 16 gas -- a 48% improvement.

Call Using Data Stack

We can also consider an alternative call mechanism -- call it DATASUB -- that pushes its return address on the data_stack:

TEST_SQUARE:
    0x02            ; 3 gas
    datasub SQUARE  ; 6 gas
    returnsub       ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    returnsub       ; 3 gas

Total 26 gas.

Using DATASUB versus JUMP saves 50 - 26 = 24 gas -- a 48% improvement.

By comparison, the RJUMPSUB version saves 28 gas -- a 56% improvement.

_Note: The gas cost for DATASUB is 6 here versus 5 for RJUMPSUB because using the wide-integer data_stack is less efficient than using a stack of native integers._

Conlusion

We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using JUMP.

Clearly, the benefits of these efficiencies are greater for programs that have been factored into smaller subroutines, but a routine could use 200 more gas than our first example and RJUMPSUB would still use better than 10% less gas than JUMP.

_Note: A stack rotation operator to move items on the stack and implicitly shift the intervening items could simplify code using JUMP. It would be a potentionally expensive operation with a dynamic gas cost._

Design Alternatives

There are at least two designs for a subroutine facility.

Turing's design keeps return addresses on a dedicated stack.

As specified above, the instruction to call a subroutine will

Push the return address onto the return address stack.
Jump to the first instruction of the subroutine.

And the instruction to return from a subroutine will

Jump to the address popped off of the return address stack.

We have chosen Turing's design, as have Forth, Java, Wasm, and others. On a stack machine almost all computation is done on the stack, and on these machines the return information is not conveniently

For these machines, an advantage of keeping a separate return stack is that it leaves to the implementation the representation of the data stack, which can remain opaque to the user. All that matters is that calls and returns work.

An alternative design is to keep return addresses on the data stack.

The instruction to call a subroutine will:

Push the return address onto the data stack.
Jump to the first instruction of the subroutine.

The instruction to return from a subroutine will:

Ensure that the return address is the first element on the stack.
Jump to the address popped off of the data stack.

This design became popular on silicon in the 1970's and remains so. Examples include the PDP 11, VAX, M68000, SPARC, and x86. These are all register machines, not stack machines. The registers are used for computation, and the stack is used by subroutine calls for return addresses and call frames.

For all of these machines instruction addresses fit efficiently into machine words on the data stack.

We give an example of this alternative design above.

We prefer Turing's design for a few reasons.

It maintains a clear separation between calculation and flow of control: the data stack is uncluttered with vulnerable return addresses and it's impossible to overwrite the return stack.
It improves efficiency by
- using native arithmetic rather than 256-bit EVM instructions for the return address,
- not using a data stack slot for the return address,
- and not imposing a need to swap items around return addresses and return addresses around items.
It has a 76-year history of success, especially on stack machines.

Keeping code simple is good. And keeping control flow opaque has clear safety advantages -- the state of the VM -- the stack, frame, and instruction pointers -- is not made visible or mutable as data.

Finally, given that the EVM is a stack machine with very wide words, the performance advantages and the decades of successful use of Turing's design by similar machines weighed heavily in our decision.

Backwards Compatibility

These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.

Test Cases

Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x60045e005b5d (PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	5	[]	[]
3	RETURNSUB	5	[]	[0]
4	STOP	0	[]	[]

Output: 0x Consumed gas: 10

Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x 00000000000000c5e005b60115e5d5b5d (PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	5	[]	[]
3	JUMPSUB	5	[]	[0]
4	RETURNSUB	5	[]	[0,3]
5	RETURNSUB	5	[]	[3]
6	STOP	0	[]	[]

Consumed gas: 20

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: (PUSH9 0x01000000000000000c, JUMPSUB,0x6801000000000000000c5e005b60115e5d5b5d, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	10	[18446744073709551628]	[]

Error: at pc=10, op=JUMPSUB: invalid jump destination

Failure 2: shallow `return stack`

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5d5858 (RETURNSUB, PC, PC)

Pc	Op	Cost	Stack	RStack
0	RETURNSUB	5	[]	[]

Error: at pc=0, op=RETURNSUB: invalid retsub

Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error

Bytecode: 0x6005565b5d5b60035e (PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB)

Pc	Op	Cost	Stack	RStack
0	PUSH1	3	[]	[]
2	JUMP	8	[5]	[]
5	JUMPDEST	1	[]	[]
6	JUMPSUB	5	[]	[]
2	RETURNSUB	5	[]	[2]
7	STOP	0	[]	[]

Consumed gas: 30

Security Considerations

These changes introduce new flow control instructions. They do not introduce any new security considerations. In concert with EIP-3779: Safer Control Flow for the EVM they will increase security by providing for validated control flow.

References

A.M. Turing, "Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)." Report E882, Executive Committee, NPL February 1946. Available: http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.htm

B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977. Available: https://doi.org/10.1093/comjnl/20.3.269

Copyright

Copyright and related rights waived via CC0.

lialan commented 5 years ago

After EIP615 died, here is another chance to improve this archaic-ish VM architecture. I myself wholeheartedly wish this can be accepted.

I work on compilers and virtual machines. Here is my personal evaluation:

Compiler support : very, very simple
VM support: simple and backward compatible
Impact: HUGE, though not as big as EIP615, but let's take baby steps
Gas impact: slightly saves gas.

I also wish SUB could take a parameter -- making it a 3 byte instruction. I don't understand people's fear of instructions longer than one byte.

axic commented 5 years ago

I also wish SUB could take a parameter -- making it a 3 byte instruction.

This is one of the reasons EIP-615 run into issues.

I don't understand people's fear of instructions longer than one byte.

I'm really in favor of having multi-byte instructions, but that requires EVM versioning. An example why doing it without versioning is risky is demonstrated here: https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/11

lialan commented 5 years ago

@axic That specific problem exists already because we have variable opcodes (PUSH) from the beginning. Also, if the analysis is smart enough, it should be able to detect that we are jumping into the middle of an instruction.

And what is the best way to avoid that? Make dynamic jumps obsolete -- that was in the EIP615 proposal!

I also think EVM versioning will happen sooner or later, unless we want to forever keep EVM as is.

axic commented 5 years ago

That specific problem exists already because we have variable opcodes (PUSH) from the beginning.

As you say that opcode is there since the beginning, which means every VM and tool is prepared for it.

gcolvin commented 5 years ago

Also, the PUSH opcodes specify how much immediate data to push, whereas BEGINSUB needs a variable amount of immediate data because different subroutines can take different numbers of arguments.

gcolvin commented 5 years ago

Edited for longer names, and to give RETURNSUB the same semantics as #615 -- restoring SP to that of the caller. This makes it much easier to use and increases forwards compatibility.

gcolvin commented 5 years ago

Nope, can't use RETURNSUB semantics without BEGINSUB. Sigh. Upside is you can write functions that take and return a variable number of arguments.

lialan commented 5 years ago

BEGINSUB needs a variable amount of immediate data because different subroutines can take different numbers of arguments.

I think only the 2-byte jump target address as immediate operand is good enough, or we can have an additional "number of arguments pushed on stack" byte in BEGINSUB.

Having static subroutine jumps is big deal, the good thing is that if we do it now, it will not break the existing convention.

lialan commented 5 years ago

So another opcode for tagging the target address of subroutines. This will deliberately disallow previous internal calling convention (using JUMP/JUMPI) using the subroutine.

We know that we definitely don't want this kinds of backward compatibilities, but I feel that it will create a little bit of inconsistency.

MrChico commented 5 years ago

@gcolvin can you clarify a few things? The current wording mentions how SP is initially set but then never mentions it again. Is SP still relevant to this proposal? Am I correct in understanding that when encountering a GOSUB, the entire data stack is pushed to the return stack and then cleared? And that upon a RETSUB, the current data stack is disregarded and the entire return stack is pushed to the data stack and cleared? Or only the part of it that came from the last GOSUB? If so, how are we keeping track of this?

gcolvin commented 4 years ago

@lialan Thanks. There is now a SUBDEST opcode to tag valid GOSUB targets. There are no restrictions here on how these instructions are used. E.g. multiple entry points, jumps into and out of subroutines, and other abominations are allowed. It's up to language and tool implementers to establish best practices. To the extent that they can be statically checked some things could be validated before runtime per @shemnon' s https://github.com/shemnon/EIPs/pull/1.

gcolvin commented 4 years ago

@MrChico Thanks. SP is unused, so I've removed it. It's unused because GOSUB and RETSUB have no effect on the data stack--it's up to the caller. This often happens by default on a stack machine: the caller pushes arguments which are consumed by the callee, leaving any results on the stack.

shemnon commented 4 years ago

The PR mentioned two comments above is now #2348

MrChico commented 4 years ago

@MrChico Thanks. SP is unused, so I've removed it. It's unused because GOSUB and RETSUB have no effect on the data stack--it's up to the caller. This often happens by default on a stack machine: the caller pushes arguments which are consumed by the callee, leaving any results on the stack.

I'm confused. The issue still refers to SP:

Execution of EVM bytecode begins with SP set to 0,

and your claim that GOSUB have no effect on the data stack seems to contradict the semantics section which claims:

[GOSUB] sets PC to the address on top of the data stack and pops the data stack.

If this was a PR instead of an issue it would be easier to review and track changes to this proposal

gcolvin commented 4 years ago

You are right on all counts @MrChico. I thought I had removed the SP reference, but it was still there--it's gone now. My comment on GOSUB was wrong, it's only RETSUB that leaves SP unchanged--I think the semantics section is right. And a PR is needed, but earning a living is taking priority, and editing an issue is easier than dealing with Git. I appreciate your patience.

gcolvin commented 4 years ago

@MrChico @shemnon @lialan @axic I've pared this down, cleaned it up, and submitted it as a draft: https://github.com/ethereum/EIPs/pull/2484

fanlong commented 4 years ago

@gcolvin

There is some inconsistency between the test case and the spec in EIP-2315. Particularly,

The spec says that BEGINSUB is not supposed to be executed and its execution will cause error. JUMPSUB will land on the next instruction after BEGINSUB.

However, the test case (and the current open-ethereum implementation) assumes BEGINSUB can be executed as a noop. JUMPSUB will land on the BEGINSUB instead of the next.

Note that I believe the spec makes more sense from the security perspective. It prevents unintended control flow behavior in EVM crossing routine boundaries.

axic commented 4 years ago

Can you please comment this on https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 which is the discussion url?

(This issue here should be closed.)

gcolvin commented 3 years ago

Reopened as working draft. Will push to EIP repo at stable points. I'm coming to see that more can be achieved by way of safety and ease of static analysis with this proposal than I even realized, and am working it out here.

gcolvin commented 3 years ago

A good summary of the history of subroutines, from Lovelace and Turing onward

https://people.cs.clemson.edu/~mark/subroutines.html

and the subroutine facilities provided by many CPUs

https://people.cs.clemson.edu/~mark/subroutines/

including the following:

704
709x
alpha
amd29k
b5000
cdc6600
i960
itanium
m68k
m88k
mips
pdp8
pdp11
ppc
s360
sparc
vax
x86

ethereum / EIPs