VM: `step` event's stack can be mutated after it is emitted

davidmurdoch commented 5 years ago

This could be a non-issue, but thought I'd bring it up just in case.

We are fixing a bug in ganache-core's gas estimation algorithm and needed to inspect the stack emitted by the step event. The way our estimation works requires that we store some of the emitted objects for inspection after the transaction is complete.

The first issue arose when we realized that the returned stack is the vm's internal stack array (i.e., it is the same array instance); this array is mutated all the time. To get around this we just cloned the stack as soon as the event is emitted (via op.stack = op.stack.slice(0)).

This appeared to work for some cases, but in other cases we found that the values in our cloned array were sometimes still being mutated later.

The root issue (for us) is that CALL's gasLimit (which was popped of the stack) sometimes gets reassigned due to the way maxCallGas works internally, and then a gasLimit.iaddn call mutates the value.

Here is maxCallGas:

https://github.com/ethereumjs/ethereumjs-vm/blob/00d8fe3c7941fe571581fff1cd4e327980928127/lib/evm/opFns.ts#L902-L905

which you can see sometimes returns the gasLimit instance passed in, and sometimes does not.

Here is where the reassignment to gasLimit (sorta sometimes) happens:

https://github.com/ethereumjs/ethereumjs-vm/blob/00d8fe3c7941fe571581fff1cd4e327980928127/lib/evm/opFns.ts#L713

And finally, here is the instance-changing call to gasLimit.iaddn:

https://github.com/ethereumjs/ethereumjs-vm/blob/00d8fe3c7941fe571581fff1cd4e327980928127/lib/evm/opFns.ts#L713

What I suspect should happen, is that the step event should deep clone the internal stack array before emitting it (which unfortunately adds some additional overhead) or, migrate the stack to using JSBI (and BigInt itself, eventually) as instances of it can't be mutated (sorta, JSBIs can be muted via internal private methods/fields -- thought it is trivial to wrap JSBI to prevent private member access).

I could see a way in which this issue is considered a moderate security issue, as step listeners can modify the internal EVM stack while it is running.

note: we fixed this for ourselves in ganache by doing op.stack = op.stack.map(val => val.clone()) as soon as we receive the step event. This is not an issue for ganache right now and won't affect our future release schedule.

alcuadrado commented 5 years ago

Hey! Thanks for investigating this!

I can imagine that it must have been very hard to figure out what was happening.

Unfortunately, I believe cloning everything before emitting would completely destroy the VM's performance. I'm inclined to think that this should be documented.

About big numbers being modified. I believe we should stop using them as mutable objects. That always leads to errors. IMO we should eventually migrate to native bigints anyway, which are immutable.

I created #606 with this change. Can you confirm that using a shallow copy works now, @davidmurdoch? If you can also compare this PR's performance that would be great.

If we found that this PR has a big enough performance hit, we should profile and optimize it. Most probably only a few usages of the mutable methods are performance-relevant.

I could see a way in which this issue is considered a moderate security issue, as step listeners can modify the internal EVM stack while it is running.

Unfortunately, I don't see a situation where the VM can be executed with untrusted code in a secure way. One can always do things like monkey-patching it.

The only interesting development in that area is SES from Agoric/Mark Miller, who's been working on this like forever. This is an interesting article on that.

cgewecke commented 5 years ago

Leaving some additional info about how performance impact might be evaluated...

Have recently benchmarked zeppelinsolidity-contracts tests and measured the number of times they fire the step:

# Tests	CircleCI Time	# of steps
> 2200	8 - 10 min	~ 11,250,000

That suite is relatively small/fast...Aragon and JoinColony have tests which take 60 - 90 minutes to complete. For my use case anyway, it makes sense to evaluate the step's efficiency at the 100 million x scale.

@davidmurdoch You mention that this might have security implications. Could you elaborate? I usually think of ganache/ethereumjs-vm as simulators and imagine that their main concerns are speed and accuracy. However I've seen this question raised at ganache before in similar context ganache-core 381 and am interested in how you view security for these tools.

davidmurdoch commented 5 years ago

@alcuadrado, thanks for getting on this! I won't have time to benchmark this for a while, but this change looks liked it'd work (though I haven't tested yet).

It may be overkill to outright ban the use of BN's instance methods though? Without a linter to ensure it stays that way I imagine it'll be difficult to enforce going forward. I doubt it'll cause any noticeable performance regressions if you do replace all instance methods, but it seems like it might be unnecessary.

Unfortunately, I don't see a situation where the VM can be executed with untrusted code in a secure way. One can always do things like monkey-patching it.

The EVM can (and does, hopefully? haha) execute transactions securely as it is (if not, I really should stop running ganache-cli on an open port for workshops I give :-D).

Also, the EVM may be used for more than just local testing of contracts in the future, like in https://github.com/ethereumjs/ethereumjs-client.

@cgewecke The issue you linked talks about a concern with handling arbitrary javascript sent over RPC -- which could be over conference wifi or in a classroom scenario; if we execute untrusted JavaScript within Node.js we are essentially allowing file system access to unknown/untrusted remote parties.

While I agree ganache's focus isn't on security*, I still think this should be treated as a potential security issue because it allows 3rd party code to modify a private execution environment via a documented API in a way that is unintended by the interface. This bug could potentially be used by a malicious actor -- though I can't think of any obvious attacks off the top of my head.

Also, ganache has plans to expose an EventEmitter interface that would re-emit events like step event to the subscriber; it'd be nice if we could re-emit the object without additional processing.

* we install modules via npm and are opening ourselves up to a world of security issues by doing so.

cgewecke commented 5 years ago

Thanks for that explanation @davidmurdoch, much appreciated!

ethereumjs / ethereumjs-monorepo

VM: `step` event's stack can be mutated after it is emitted #605