ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.32k stars 5.77k forks source link

Allow `verbatim` in Solidity `assembly` blocks #12067

Open dessaya opened 3 years ago

dessaya commented 3 years ago

There is currently support for verbatim, allowing to instert arbitrary bytecode, but only when compiling in strict assembly mode. But the verbatim group of functions ios not available inside assembly blocks in Solidity code. Example:

function f() {
    assembly {
        verbatim_0i_0o(hex"c0")
    }
}

When compiling with solc:

Error: Function "verbatim_0i_0o" not found.

What is the motivation for disabling verbatim in Solidity? I understand that it must be used with care and only for very specific reasons. My use case is that I am targetting a forked version of the EVM interpreter with new opcodes. I'm currently unable to use solc to compile contracts targetting this fork.

Thanks!

chriseth commented 3 years ago

The reason is that we have to think about which optimizer steps to disable, and probably we have to disable almost all of them. Can you tell us more about the new opcodes in the fork?

dessaya commented 3 years ago

For now it's a proof of concept, and it's not even clear that this approach is the correct one.

We want to add support for Solidity as a language for an experimental language-agnostic smart contracts platform. We don't need or want to emulate a whole Ethereum blockchain, since our smart contracts run in level 2; we only need to run the Solidity code and store the current state. What we need is some way to bridge the Solidity code with the underlying sandbox; and for that one idea is to add some custom opcodes, like some form of "syscalls", like "obtain the chain ID" or "publish an event". Again, maybe there is a better approach but for now we are experimenting with this.

Thanks again!

chriseth commented 3 years ago

Wouldn't it be better to do this on the yul level instead? Plese feel free to schedule a call to discuss in more detail!

dessaya commented 3 years ago

What do you mean exactly? I know that I can write a contract in yul, and I can inject arbitrary bytecode with verbatim. This is what we are doing right now; but it would be nice to be able to write Solidity, and have a few small Yul blocks with assembly {...} where needed.

hrkrshnn commented 3 years ago

@dessaya To summarize why we haven't done this so far, the Yul optimizer wasn't designed keeping verbatim in mind. This means that if the Yul optimizer is enabled, then verbatim has to be used with extreme caution. We assume that users of pure Yul already are in a position to make this judgement (we have added some example situations that would lead to undefined behaviour: https://docs.soliditylang.org/en/latest/yul.html#verbatim).

We can of course allow it in Solidity's inline assembly. The question is if the Yul optimizer should be completely disabled or partly; if partly, which ones? It would be useful if you can give a list of additional opcodes that you would like to add this way, to get an idea of how this would work for your case.

chriseth commented 3 years ago

What I meant is to compile solidity to yul using solc --ir and then modify the resulting yul code where you can freely inject verbatim. It would be nice for us to know what exactly these syscalls are and how you want to link them to the solidity code.

dessaya commented 3 years ago

We are still evaluating options, so we haven't settled on what the opcodes will look like exactly. But an example would be something very simple to retrieve a value from the context:

let x := verbatim_0i_1o(hex"c1")

where the return value is provided by our sandbox; for example the (non-ethereum) address of the contract creator.

@chriseth I was not aware that solc --ir actually produced Yul code. It looks like this could be useful for our use case. I'll explore this approach. I think we can close this issue for now. Thanks!

axic commented 3 years ago

Instead of special opcodes it is also possible to use special addresses to exchange data (i.e. precompiles or system contract on some chains). If you have complete control over your system, that may be a nicer way because all EVM compatible languages (Vyper, Fe, etc.) could be made to work with it without changes.

axic commented 2 years ago

As mentioned we wanted to have this enabled originally, but the main problem here is somehow signalling what commitments the verbatim code makes against various EVM properties (e.g. "reads state", "modifies state", "terminates", etc.)

Since we have the memory-safe annotation, we could think about introducing more of such annotations (eventually somewhat resembling SemanticInformation). In that case would could enabled it "rather safely".

Additionally we could think about introducing a "clobbered variables" list.

brockelmore commented 2 years ago

looking forward to this! so why not just do analysis on the ops to ensure no stack manipulation deeper than the "inputs" in the verbatim name & ensures that only 1 element is added to the stack at the end? if you have that guarantee + memory safety, while you dont get "full" mutability of the execution context, you do get a decent ways safely

if you want to enable this sort of check maybe something like: assembly ("memory-safe stack-safe") { verbatim_1i_1o(<pop, push(1) ops>, a}

And if you don't have stack-safe all bets are off on optimizations? Here it would be trivial to look op by op and ensure its stack safe, and if you break the safety contract there is a compiler error. And you can also easily run an analysis to check that storage purity in solidity matches verbatim opcodes. I think at this low level its fine to force users to enumerate the safety they are guaranteeing the compiler, i.e. one could imagine:

assembly ("memory-safe stack-safe jumpless storage-pure") etc.

To be clear I am not as knowledgeable on the compiler as y'all obviously, just my 2 cents

hrkrshnn commented 2 years ago

so why not just do analysis on the ops to ensure no stack manipulation deeper than the "inputs" in the verbatim name & ensures that only 1 element is added to the stack at the end?

One early goal for verbatim was to support opcodes that are not yet in the EVM. Other machines like OVM1, new proposed EIPs, other EVM like chains are example use cases. So we cannot really analyze the raw bytecode in general.

User annotation is a good idea. However, we have to check what optimizations we can enable with that. The Optimizer wasn't designed with Verbatim in mind, so we'll have to really look at what steps can be broken by verbatim even with the correct user-annotations.

wschwab commented 2 years ago

It would be useful if you can give a list of additional opcodes that you would like to add this way

I'm coming in late to this conversation, but am exploring the possibility of using verbatim for testing EIP-3074. I originally posted in the Solidity forum here about experimental 3074 support, and was directed here by @cameel , who thought this to be a better route than experimental provision due to the optimizer. I've responded there since I still think experimental provision is the better path, but wanted to pursue this here too.

Optimization in our case isn't really a concern - this would be meant for prototyping and validation, and wouldn't be needed in the meanwhile for production-optimized code.

brockelmore commented 2 years ago

User annotation is a good idea. However, we have to check what optimizations we can enable with that. The Optimizer wasn't designed with Verbatim in mind, so we'll have to really look at what steps can be broken by verbatim even with the correct user-annotations.

One option is to just disallow the optimizer to be on if verbatim is used. Then slowly as the team analyzes what optimizations are possible for given annotations, they can be added. i.e.:

assembly ("experimental-ops") { verbatim_i0_o0(<AUTHCALL, EXPERIMENTALOP, etc>) }

when there are experimental opcodes, the user defines it as such. If that flag isn't present, stack analysis can be done. If it isnt present and the user uses an unknown opcode, the compiler would report something to the effect of: Unknown opcode in verbatim "0xB4". If intentional, add the "experimental-ops" flag and turn off the optimizer to use this op.

For clarity, when talking about optimizations here, I assume we mean at the contract level, not the verbatim bytecode level? In general, I don't want the optimizer touching my verbatim (see #12951) anyway. I view the user annotation as a contract between the user and the compiler - somethings are for the compiler, somethings are for me. What I mean is things like memory-safe - for the compiler, stack-safe - for me and the compiler (the compiler can check that it is truly stack safe), storage-pure - for me, the compiler can easily check if storage ops occur.

When thinking about the evm, and what optimizations would be safe (on the contract level), it feels like if the following guarantees are made it should be fine:

  1. memory-safe: "I promise to only store values at or after the value of the free memory pointer, and if I want the value to live longer, I will update the free memory pointer"
  2. stack-safe: "I promise to only affect stack elements that are passed in, and/or newly generated stack elements. I will not touch any stack elements I do not 'own'."
  3. jumpless: "I promise to not jump and break control flow analysis that may be done" (this is less evm specific, more solidity specific, so I imagine you could have a subset of self-contained-jumps that allows jumping to locations inside the verbatim block)
  4. storage-view: "I promise to not write to storage" (keeps previously sload-ed values correct)
  5. experimental-ops: "I am doing crazy stuff, I am sorry Mrs. Optimizor, go have a seat and relax"

It feels like if you have 1, 2, 3, and 4, and the verbatim code isn't modified by the compiler, the compiler can view it as an inlined function of sorts. If you don't, the optimizer takes a hike.

I may be missing some things, and if anyone things of some safety contracts that need to hold it may be worthwhile to post them here?

In my mind an MVP of this would be:

enum SafetyContractItem {
    MemSafe,
    StackSafe,
    Jumpless,
    StorageSafe,
    Experimental
}

if verbatim_block
    .iter()
    .all(|safety_item| {
        matches!(safety_item, 
            SafetyContractItem::MemSafe,
            SafetyContractItem::StackSafe,
            SafetyContractItem::Jumpless,
            SafetyContractItem::StorageSafe
        )
        && !matches(safety_item, SafetyContractItem::Experimental)
    }) {
    optimizations_possible = verbatim_block.analyze_stack_promise();
    optimizations_possible = verbatim_block.analyze_storage_promise();
    optimizations_possible = verbatim_block.analyze_jumpless_promise();
}

And otherwise throw an error if the optimizer is on. If only a subset of needed promises are made, then throw the error. This gets us most of the way there in my mind:

  1. Supports weird op codes
  2. Allows the vast majority of verbatim uses

And I want to drive home that no one expects (nor likely wants) verbatim block interiors to be optimized by the compiler so if that is a major hang up, if possible just avoid that altogether.

Again, I could definitely be off-base here, i'm just a layman trynna to get a cool feature pushed thru :).

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to inactivity for the last 90 days. It will be automatically closed in 7 days.

brockelmore commented 1 year ago

please dont let this issue die :)

robinsdan commented 1 year ago

I want use this to insert datahash opcode introduced in eip4844.

Hellobloc commented 1 year ago

verbatim is dangerous and should not be added to solidity at will. At least it needs a lot of restrictions, such as declaring it in the compilation configuration before it can be used.

A contract that uses verbatim should not be verified by the source code verifier. Thinking about verbatim can generate any bytecode's contract source code, that means I can complete the contract verification as long as I know the bytecode. This idea comes from samczsun

hrkrshnn commented 1 year ago

Yeah, verification is a good point. If we allow verabatim, any bytecode in the mainnet can be trivially verified by a corresponding verbatim_0i_0o(code).

fvictorio commented 1 year ago

@Hellobloc is your point that verbatim makes it easier to completely obfuscate the code?

I don't understand why that is a reason to not include it in solc. You can already obfuscate code today. And if you do that, a quick look at the "verified" code is enough to know you are doing that and that you are (maybe) a bad actor. verbatim would make obfuscaton easier and way harder to decode, but I don't see how it fundamentally changes things.

Actually, I see something that is different: it makes it easier to get a meaningless "Verified" checkmark in an explorer. But IMO that's not reason enough to not add a useful feature.

Hellobloc commented 1 year ago

@fvictorio I feel like you don't seem to understand what I mean. First of all this is definitely not a problem that can be found with a quick review, it's just that the attacker may not have constructed it very sophisticatedly. Secondly, this attack is not just about saying I deploy a contract with some verbatim snippets in it that I used in such a way to make you confuse. Futher, I can go verify the source code of uniswapv4's contracts, or other large projects. It's an safety issue, you can't just think this feature makes you feel good and you ask them to add it. solidity project side is very demanding on security and you can see they need to consider a lot of things when designing and I just added one of the things they considered. Secondly, I'm not saying that this feature shouldn't be added, but I don't think a contract that uses such a feature is likely to be allowed to pass source code verification.

Actually the above issue is not feasible in some versions of contract source code validation because solc adds an invalid value bytecode fe at the end of the bytecode, this value is 00 in version 0.4. I have to say here that solidity really shocked me because it prevents the problem of loose assembly for source forgery, I'm not sure if this is their design, but they must have had a lot of consideration. I think it is worth discussing whether a new invalid value ending should be added inside the yul.

And I'm curious about what you call obfuscate code because I'm looking forward to having more options for constructing source code, maybe I'm missing something.

fvictorio commented 1 year ago

That's fair, thanks for elaborating 🙂

fvictorio commented 1 year ago

And I'm curious about what you call obfuscate code because I'm looking forward to having more options for constructing source code, maybe I'm missing something.

My initial understanding of the concern was that it allowed you to verify a contract with meaningless code, which is sort of like obfuscating it. But I was thinking about the deployer doing it. I agree that this makes troll-verifying (I see a deployment on chain from a well-knwon account and immediately verify it using verbatim) super easy and something that explorers should take into account.

Hellobloc commented 1 year ago

I feel like it shouldn't throw the question to the source code verifier. Actually I feel there is a solution to generate a new invalid value ending for verbatim, I don't know if that would solve the problem. But like I said at the beginning, there needs to be at least some restrictions.

highskore commented 1 year ago

we still need this 💯

brockelmore commented 1 year ago

A proposal for fixing the verification issue:

Have an unreachable metadata that points to verbatim blocks:

contract A {
    function b() public returns (uint256) {
        uint256 x = 100;
        uint256 y;
        assembly ("memory-safe") {
            y := verbatim_0i_1o(hex"6001")
        }
        return x + y;
    }
}

Would loosely codegen into:

b_block:
PUSH(100)

verbatim_block:
PUSH(1)

// mstore and return
// .. snip ..
VERBATIM_METADATA_START
NUM_VERBATIM_BLOCKS
VERBATIM_IDENT_OP
verbatim_block
verbatim_block_len

// if more than one verbatims
VERBATIM_IDENT_OP
verbatim_block2
verbatim_block2_len

This would make it easy on an integrator like etherscan to verify that only portions of the code are verbatim and couldn't be faked (unless the author of the contract explicitly messes with it). They handle regular metadata fine so i dont think this would be a big challenge for them.

Sure this adds 3 bytes/verbatim block + 2, but i think thats fine personally.

Lohann commented 7 months ago

While solidity don't support verbatim we must be creative... I implemented this workaround that doesn't require you to manually edit the bytecode: https://gist.github.com/Lohann/3c1073d83be667a64ba62654a5b4a469

contract ExampleImpl {
    // Workaround for calling an arbitrary code from solidity
    function _verbatim() private pure returns (uint256 output)  {
        assembly ("memory-safe") {
            let ptr := mload(0x40)

            // Force a constant to be represented as 32 repeated '7E' in the runtime code.
            mstore(ptr, 0x7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E)
            // Return the result of the inline code
            output := mload(ptr)
        }
    }

    function add(uint256 a, uint256 b) external pure returns (uint256) {
        // Once we don't know the stack order, we store the two parameter in memory.
        // so the inline code can access it and sum a and b
        assembly ("memory-safe") {
            let ptr := mload(0x40) // get free memory ptr
            mstore(ptr, a)
            mstore(add(ptr, 32), b)
        }
        return _verbatim();
    }
}

contract Example is ExampleImpl {
    // OBS: The code MUST have exact 32 bytes in size and push ONE value onto the stack.
    // PUSH22 0x40 MLOAD DUP1 PUSH1 0x20 ADD MLOAD SWAP1 MLOAD ADD
    bytes32 private constant INLINE_BYTECODE = 0x7500000000000000000000000000000000000000000040518060200151905101;

    constructor() payable {
        bytes memory runtimeCode = type(ExampleImpl).runtimeCode;
        assembly ("memory-safe") {
            let size := mload(runtimeCode)
            // Efficient algorithm to inject a bytecode in the contract by
            // replace the code PUSH31 0x7E7E7E....

            // Initial search position.
            let ptr := add(runtimeCode, 32) 

            // Efficient Algorithm to find 32 repeated bytes (ex: 0x7E7E7E..) in a byte sequence
            for { let chunk := 1 } gt(chunk, 0) { ptr := add(ptr, chunk) } {
                // Transform all `0x7E` bytes into `0xFF`
                // 0x81 ^ 0x7E == 0xFF
                // Also transform all other bytes in something different than `0xFF`
                chunk := xor(mload(i), 0x8181818181818181818181818181818181818181818181818181818181818181)

                // Find the right most unset bit
                // (0x12345678FFFFFF + 1) & (~0x12345678FFFFFF) == 0x00000001000000
                chunk := and(add(chunk, 1), not(chunk))

                // Round down to the closest power of 2 multiple of 256
                // Ex: 2 ** 18 become 2 ** 16
                chunk := div(chunk, mod(chunk, 0xff))

                // Find the number of leading bytes different than `0x7E`.
                // Rationale:
                // Multiplying a number by a power of 2 is the same as shifting the bits to the left
                // 1337 * (2 ** 16) == 1337 << 16
                // Once the chunk is a multiple of 256 it always shift entire bytes, we use this to
                // select a specific byte in a byte sequence.
                chunk :=
                    shr(248, mul(0x201f1e1d1c1b1a191817161514131211100f0e0d0c0b0a090807060504030201, chunk))
            }

            // Replace '0x7E7E...' by some arbitrary code
            mstore(ptr, INLINE_BYTECODE)

            // This code can be easily extended to run any arbitrary code of any size by appending it at the end of runtime code.
            // and using the `INLINE_BYTECODE` to jump to this location. 
            return (add(runtimeCode, 32), mload(runtimeCode))
        }
    }
}
IaroslavMazur commented 6 months ago

@chriseth I was not aware that solc --ir actually produced Yul code. It looks like this could be useful for our use case. I'll explore this approach. I think we can close this issue for now. Thanks!

@dessaya, have you managed to make this work? I'm working on a similar thing, and would love to discuss where you've got with your R&D!

You can find ways to contact me on my GitHub Profile page.