ethereum / EIPs

The Ethereum Improvement Proposal repository
https://eips.ethereum.org/
Creative Commons Zero v1.0 Universal
12.94k stars 5.31k forks source link

Contract code size limit #170

Closed vbuterin closed 7 years ago

vbuterin commented 8 years ago

EDITOR UPDATE (2017-08-15): This EIP is now located at https://github.com/ethereum/EIPs/blob/master/EIPS/eip-170.md. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Specification

If block.number >= FORK_BLKNUM, then if contract creation initialization returns data with length of at least 24577 bytes, contract creation fails. Equivalently, one could describe this change as saying that the contract initialization gas cost is changed from 200 * len(code) to 200 * len(code) if len(code) < 24577 else 2**256 - 1.

Rationale

Currently, there remains one slight quadratic vulnerability in ethereum: when a contract is called, even though the call takes a constant amount of gas, the call can trigger O(n) cost in terms of reading the code from disk, preprocessing the code for VM execution, and also adding O(n) data to the Merkle proof for the block's proof-of-validity. At current gas levels, this is acceptable even if suboptimal. At the higher gas levels that could be triggered in the future, possibly very soon due to dynamic gas limit rules, this would become a greater concern - not nearly as serious as recent denial of service attacks, but still inconvenient especially for future light clients verifying proofs of validity or invalidity. The solution is to put a hard cap on the size of an object that can be saved to the blockchain, and do so non-disruptively by setting the cap at a value slightly higher than what is feasible with current gas limits (an pathological worst-case contract can be created with ~23200 bytes using 4.7 million gas, and a normally created contract can go up to ~18 kb).

If this is to be added, it should be added as soon as possible, or at least before any periods of higher than 4.7 million gas usage allow potential attackers to create contracts larger than 24000 bytes.

mchlmicy commented 8 years ago

What kind of implications would this have for contracts which call each other? Would there be a limitation to the number of contracts you can chain together, or would the limit reflect the total bytes size of the contracts that are chained together.

Further, is there potential for optimization in the case where contracts call each other (ie. only read in functions that will be used in the contract call stack)? I don't know how difficult something like this would be, but it might optimize VM disk reads.

Itshalffull commented 8 years ago

Hard limits can become significant political bottlenecks in governance, as we've seen with Bitcoin.

Could we add a uniform linear price increase to opcode costs that scales with contract size? If (len(code) > 2400) {gasprice += (len(code) - 2400)/10}

yaronvel commented 8 years ago

Could you explain where is the quadratic blowup? You only describe several O(n) operations.

Edit: Question was answered here

SergioDemianLerner commented 8 years ago

You could make a contract a vector of 24 Kbytes pages. The first page is loaded for free when a CALL is received, while jumping or running into another page pays a fee (500 gas) because the page must be fetched from disk. This way you still maintain the possibility of arbitrary sized contracts.

julian1 commented 8 years ago

You only describe several O(n) operations.

Agree, O(n) + O(n) = O(n)

jbaylina commented 8 years ago

The first time you call a specific contract in a transaction should not have the same price that the next ones. The first one implies a fetch from disk, the subsequent ones, the contract will be already in the client memory cache.

In other words, it should not cost the same to do 1000 calls to 1000 different contracts that 1000 calls to one single contract.

The high cost of the call, disincentives the use of libraries.

My proposition is to take all opcodes that access disk and split the cost in FETCH cost and EXECUTION cost. The FETCH cost would be accounted only the first time it's accessed the object in the transaction.

DominiLux commented 8 years ago

I'm for any improvements to the network during this fork. Anything that stabilizes the network, it's nodes, and helps to secure the underlying protocol needs to be implemented before we can move out of this phase of the project. I'm currently working on some of my own clever bytes of Solidity but am stuck in development due to the changes in progress. So yes, I’m all for going ahead with any and all changes that benefit the future of the Ethereum protocol. I call it a protocol because so many people fail to realize that this project is going to accomplish what Java never could.... Applications that are written once on chain and the developer doesn’t even have to think about OS compatibility because its a built in feature.

I will be very excited to submit a press release for the project i'm currently working on which will run on the public ethereum blockchain and supply a service that is needed in exchange for eth tokens. I have the overall concept behind it and am now in the pre development planning stages. Big projects have to be broken down into smaller ones.

One suggestion: Forgive my ignorance if it's already being done but just as any other programming language includes a series of Libraries pre-programmed to do various things, perhaps Solidty should have this as well. Let me elaborate... Since solidity is designed to run on chain, if a person where to write an open source library to do something like find the square root of number XYZ... That library would not need to be compiled again and should be called upon by your current code to obtain the results. By having precompiled white listed libraries of code that are considered "nice" by the protocol's standards, that code could be used by all programmers thereby minimizing excessive bloat of the blockchain. Additionally a caching mechanism could be put in place for certain library contracts that are dynamically determined to be called the most across the network to substantially increase the execution of code. Think of it as a way of doing object oriented programming on a block chain. Over time as more white listed libraries are created and IDE could be generated that would simply allow the coder to drag and drop in these libraries as needed. However, the compiled code would simply be a pointer to the pre-compiled library that exists on the transaction ledger. These libraries would have to be treated differently than normal contracts and be given gas exceptions to attract developers to use them over reinventing the wheel every time. Hence the library white list. This design will keep solidity as simple from a syntax perspective as c++ and the expansion would be done through the addition of precompiled libraries that others code can point to. I believe this concept would fit well with the sharding of the blockchain because when your code calls another library, that portion of the code would probabley get executed on a different node completely. Now that I have gotten onto the concept of sharding the blockchain; allow me to provide the solution.

How To Shard A BlockChain: Through my lifes experiences one thing i've realized is that a problem you may be facing now has at some point already been solved. It may not even be in the same area as your problem. Nevertheless, the same solution can be revised to fit a new problem. The same is true of sharding the blockchain. To properly shard a trustless block chain requires the following:

  1. Nodes should be able to come and go as they please without interuption of the blockchain
  2. If half the network suddenly went out at once the blockchain MUST remain intact
  3. At any given point if someone desired, they should be able to make a full copy of the unsharded blockchain
  4. It must remain decentralized (Which in fact sharding would naturally promote even more)
  5. It needs to allow for multi threading (A solution for this was presented in earlier concepts within my post)
  6. It must have a track record of proven resiliancy.

I'm sure there are many points that i'm missing but the solution is simple and has been around for some time. It just needs to be designed to be more dynamic.

The Sharding Solution: RAIN (Atleast that's the acronym for it) Redundant Array of Inexpensive Nodes

For this to work on a block chain it would need to incorporate all of the concepts of RAID but thinking of nodes as disks. It would also need to use striping, mirroring, and parity, and nodes have to be dynamic and the array infinitley scalable (To the extent of available address spacing).

In a normal striped array you have something like this Node1(DataPortionA) Node2 (DataPortionB) Node3 (DataPortionC) Node4 (DataPortionD) and all data is scattered equally across all nodes. Unfortunatley this solution does not meet our criteria set beforehand.

In a striped mirrored array it would look like this... Node1(DataPortionA) Node2 (DataPortionB) Node3 (MirroredDataPortionA) Node4 (MirroredDataPortionB)

While this offers redudancy it's not enough. However when you start adding parity mixes between the nodes while they are stripped and mirrored that's when it becomes a decentralized sharded block chain. Below is a simplified model of this.

Node A Node B Node C Node D DataA Data B Data C Data D
ParityB Parity C Parity D Parity A ParityC Parity D Parity A Parity B

When a node connects it will by default try to continue on as the node type it was before. However, if after connecting it determines that the balance of nodes is off it will become a different node and rebuild it's data based on the parity data that exhists instead of trying to download everything. Parity data can be easily compressed with effecient algorithms and since they only contain enough data to compute and rebuild the chain the node would not need to download very much information. Also, the node would build from the top down from the parity data so that a particular node would be online in seconds instead of hours, it would just lack being fully synched with all the parity data.

If the size of the balanced nodes databases reaches a "Critical Mass" the system would automatically increase the number of node types. As a simple example, earlier I showed nodes A B C and D. But if the database were to get too large the system could dynamically double the number of node types to A B C D E F G H. Each containing a data partition and two parity partitions of other nodes that it never reads from and only writes to unless it determines it needs to become that node type based on earlier criteria.

Conclusion: Well that should give all of you some interesting things to discuss. I hope it helps.

DominiLux commented 8 years ago

For documentation purposes I almost left out the mirroring part. Although it is obvious each node type would be able to have identical node types across the network. When the database get's too large triggering a node expansion those would get mirrored as well. In theory you could easily keep the work load of single nodes to such a minimal amount of storage and computational power that people wouldn’t notice it running and wouldn’t mind just leaving it on all the time. Also, you could go for a complex algorithm for node balancing by assigning each node type a "weight" based on it's density (I believe weight and density formulas already exist so it would be easy to accomplish). Or you could use chaos theory and have nodes assigned at random and watch in amazement as the system just magically balances itself out with hardly any code ;)

DominiLux commented 8 years ago

`// A function that determines if nodes need rebalancing based on their densities // In this example I use an array of objects called NodeDensity struct NodeDensity { int NodeType; // An integer representation for a group of mirrored nodes int NodeCount; // The number of nodes of the type specified to be weighed };

bool BalanceNodes(NodeDensity) { // From here it's just looping through the struct array to get the average, // Deciding on a standard deviation of the absolute value of NodeCount-NodeAverages // If nodes are falling outside of the standard deviation it returns true // else it returns false }

// The next step would be to start broadcasting for a rebalance until the majority of nodes are in agreement. This will get rid of random rebalancing anamolies // Ran out of time didnt have enough time to fill in the code above. But the structure is their and pseudo code.

BlameByte commented 8 years ago

I am not a fan of limiting the max gas limit of a contract I feel it will harm those who are currently or are planning on making larger contracts which cost 3-4 million gas. (I am currently developing such a large contract, been unable to deploy due to current 2m gas limit so I am unsure if it would be affected by this change).

Instead would it not make more sense to make it cost more gas to load a bigger contract? I feel that capping a hard cap on contract code to 24kb will likely become an issue in the future.

I am aware there would be workarounds such as making multiple contracts or storing code in storage, however both of these would make such a contract considerably harder to develop and increase the fees dramatically. As well as make it harder to verify the code as you would have to review multiple contracts, and you would have to keep in mind this while developing and pass on any variables as well as make sure you do checks to make sure the call is coming from the correct place.

So while I understand this might become an issue if someone were to several create large contracts and call them but I do not feel that is more harmful to those current under development contracts (like mine) which would likely have to have considerable changes made to split up between different contracts. (it already has 3 needed). Plus the added work on future development and requiring review of multiple contracts instead of one.

Personally I feel that contract development should be as easy as possible to encourage more developers to join in and imposing hard limits on the code or storage would make development harder and would likely cause a lot of confusion. People reviewing would have to look at multiple contracts and make sure that the correct variables are being passed and that only the main contract can make calls.

I just feel that this will bring more harm than good, I am aware that most contracts will be under this however should we really be punishing those trying to make something big and advanced?

wighawag commented 8 years ago

Just weighting in my own use case : I am also developing a contract that currently cost around 3.1 million gas and looking at the code I do not consider it a very complex one. Breaking it would complexify it for no good reason. Anything that forbid big contract is not very welcome from my point of view :)

Smithgift commented 8 years ago

I, too, have created gargantuan contracts. It's amazing how fast it can grow.

Some of it is alleviated by using libraries, but libraries can only do so much. If, for example, a library was being used for the psuedopolymorphism of a data type in another contract, that library has to contain all the code for that type. I don't think it's impossible that a single library would break the limit.

SergioDemianLerner commented 8 years ago

You could make a contract a vector of 24 Kbytes pages. The first page is loaded for free when a CALL is received, while jumping or running into another page pays a fee (500 gas) because the page must be fetched from disk. This way you still maintain the possibility of arbitrary sized contracts.

DominiLux commented 8 years ago

Whatever solution you have, I am willing to donate some of my servers resources to running a testnet on the code. I have two servers right now that are dedicated bare metal full nodes which run 24/7 and have a 50 peer connection limit (Which always stays maxed out) plus a dedicated slot for me to synch my personal node. These have 0 problems with the transaction spam and I put them in place prior to the previous fork to help support the network. However, these nodes barely put a dent in the two servers resources. I have 8 extra IP's and could easily launch 8 VM's (4 per server) to run a small test net. Let me know if these resources are needed to assist with testing concepts for this upgrade. An IPSec tunnel could be configured between the 8 test nodes to "JAIL" it from the rest of the network.

Speaking of tunnelling over a public network, to have a pinpoint vpn buildt into the nodes software for private ethereum blockchains would be a great feature to add and could attract a lot of extra developers to the network and the utilization of the technology. Setting up a large scale site to site ipsec that's securely tunnelled over the public internet is expensive but less expensive than running direct fibre from one site to another. To have the feature built into the nodes to communicate between each other through an encrypted tunnel, with minimal configuration except for some basic command line parameters. and a shared key that could be generated by the first node and copied over to additional nodes would make it easier for large organizations to launch their own private blockchains. I believe this could easily be integrated because the nodes already have commonly used encryption algorithms built in as classes and functions. One would only re-call the same function for the purposes of encrypting IP Packets instead of using it to "brute force attack" a nonce(Mining).

vbuterin commented 8 years ago

So, regarding paging, note that if you want to make a "contract" larger than 24kb, you can still make a contraption using delegatecall (and HLLs eventually should have functionality to make this automatic), and get an almost equivalent effect except for higher gas costs (and if pagination was done, then that same effect would exist too). I'd also argue that this limit is more like a transaction size limit than a contract size limit.

In the long term, we could do pagination, but doing that properly would require changing the hash algorithm used to store contract code - specifically, making it a Patricia tree rather than a simple sha3 hash - and that would increase protocol complexity, so I'm not sure that would actually create significant improvements on top of the delegatecall approach.

gavofyork commented 8 years ago

Rather than having a maximum allowable code size of 23,999 bytes, I would propose we follow in the convention of the max_depth and stack_limit parameters and nominal a power of two (or at least even) number for it.

Two options:

chriseth commented 8 years ago

Proposed update to the spec:

Specification

If block.number >= FORK_BLKNUM, then if contract creation initialization returns data with length of more than 0x6000 (2**14 + 2**13) bytes, contract creation fails with an out of gas error.

Rationale

Currently, there remains one slight quadratic vulnerability in ethereum: when a contract is called, even though the call takes a constant amount of gas, the call can trigger O(n) cost in terms of reading the code from disk, preprocessing the code for VM execution, and also adding O(n) data to the Merkle proof for the block's proof-of-validity. At current gas levels, this is acceptable even if suboptimal. At the higher gas levels that could be triggered in the future, possibly very soon due to dynamic gas limit rules, this would become a greater concern - not nearly as serious as recent denial of service attacks, but still inconvenient especially for future light clients verifying proofs of validity or invalidity. The solution is to put a hard cap on the size of an object that can be saved to the blockchain, and do so non-disruptively by setting the cap at a value slightly higher than what is feasible with current gas limits.

wighawag commented 8 years ago

@vbuterin @SergioDemianLerner I think paging cost is the right solution here. delegatecall is not an elegant solution for some contract. While breaking down code into library makes perfect sense in some case and improve readability, this is not always the case.

BlameByte commented 8 years ago

After some reviewing of the contracts I have been working on I am very close to the 24000 limit and I plan on adding some user friendly functions (such as being able to withdraw to a separate account), therefore I can not support this proposal in its current state.

As for splitting up my contract, I already have 2 separate which are used for creation of various parts which helps moves some of that logic away from the main contract. As this contract is nearing completion I do not want to increase the complexity by splitting the base contract up or omitting various parts to better suit this change.

Also due to the fact that I have not yet tested all functions to ensure they are secure and likely some might need additional code to prevent / just to improve them I feel I would have to remove existing code to incorporate this, with my only solution to be simply create an additional contract and complicate the logic even more.

I am open to other changes such as charging additional gas if the contract is more than 24000 bytes would be a better change, which could fix this attack and still allow developers to create what they want and be charged adequately.

I feel as though lately people in Ethereum have little regard for people developing contracts: The miners keeping the gas limit low for the past couple of months, baring larger contracts and now a permanent ban on such big contracts with little regard to those developing them. I know people like and want simple contracts, but some I like to think big and really test what the platform is capable of.

chriseth commented 8 years ago

@BlameByte reach out for me on gitter, I can take a look at your contract, if you want.

3esmit commented 8 years ago

So, what is the final consesus about max size? Seems like geth & parity is 24576.

hrishikeshio commented 8 years ago

Please allow increasing max contract size for private networks. Our Eris contracts are not deployable on ethereum private network because of this limit.

ethernomad commented 7 years ago

@hrishikeshio you can configure private networks anyway you like

ethernomad commented 7 years ago

Why has this EIP been accepted before FORK_BLKNUM has been defined?

hrishikeshio commented 7 years ago

@ethernomad As far as I know, there is not command line option or genesis file config to increase this limit. I guess the only option is to modify the source and compile manually.

ethernomad commented 7 years ago

@hrishikeshio this repo is not specific to any particular implementation. It seems with Parity at least it is configurable: https://github.com/ethcore/parity/blob/master/ethcore/res/ethereum/frontier.json#L138

iamtheghostlove commented 7 years ago

In this still the case?

Is there a plan to change this in the future?

Smithgift commented 7 years ago

@iamtheghostlove: It is definitely currently the case. I am personally unaware of any plans to change it.

eolszewski commented 7 years ago

@chriseth can we close this issue given this PR was merged?

cdetrio commented 7 years ago

This EIP is now located at https://github.com/ethereum/EIPs/blob/master/EIPS/eip-170.md. Please go there for the correct specification. The text in this issue may be incorrect or outdated, and is not maintained.

bwheeler96 commented 6 years ago

Is there a reason we can't make the transaction fail earlier? I just burned $80 deploying a contract that works fine on Kovan

chriseth commented 6 years ago

The transaction should "fail" early during gas estimation.

bwheeler96 commented 6 years ago

@chriseth it doesn't when using the ledger js provider :(

bwheeler96 commented 6 years ago

Disregard, I actually effed this up in a different way :(

ekreda commented 6 years ago

We are on a private network (geth based)! if this value is hardcoded it's defiantly not suitable for private net and introducing a real drawback.

Is there a command line option or genesis file config to increase/override this limitation?

ekreda commented 6 years ago

@ethernomad the link you put (over a year ago) for the Parity configuration for this limitation is broken. Can you point us to it ? we are still trying to bypass this issue...

Update: We found it its maxCodeSize

New question: Does anyone know if geth is using this param too? and where can we set it?

nathanawmk commented 6 years ago

@ekreda i am having that issues as well. I tried putting maxCodeSize but it did not work.