Souptacular commented 6 years ago

Ethereum Core Devs Meeting 30 Agenda

Meeting Date/Time: Friday 12/15/17 at 14:00 UTC

Meeting Duration 1.5 hours

YouTube Live Stream Link

Agenda

Testing Updates.
Digital cats caused network congestion this month. Meow. a. Why did this happen and what solutions are available to prevent future network congestion? See comments below for some ideas. b. Stateless Clients proposal. c. Would having minimum system requirements to set up an optimal client/full node help? d. Is the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth? e. Vitalik has some ideas around gas cost changes and scalability-relevant client optimizations.
Plans on Quantum-resistant cryptography and any plans to include it in the next update?
Introduction to K-EVM team (Everett H.)
Does it remain the case that the Yellow Paper is intended to be Ethereum's formal specification?

Time permitting:

Parity stuck ether proposals.
POA Testnet unification [Update]
Core team updates.

Please provide comments to add or correct agenda topics.

5chdn commented 6 years ago

Shall we talk about transaction backlogs?

Just some random thoughts.

Shouldn't the block gas limit go up at this consistently high transaction load?

screenshot at 2017-12-06 10-51-33

Is there anything short-term we can do? Like recommending higher gas limits? Is it even safe to recommend higher block gas limits? If yes, what would be a reasonable limit?
Is there anything we can do to improve applications like crypto-kitties to use less gas, or anything else to relax the situation? Did anyone look into options yet?

ethernian commented 6 years ago

Is there anything short-term we can do?

Just a mid-term raw idea (not perfect, I know): We could limit gas usage (or increase min gas price specifically for heavy contracts) per contract group (contracts with the same codebase) if network becomes overloaded. A contract deployer can't easily overcome this restriction by delivering many slightly altered contracts with another codebase, because this bunch of different contracts could not be so easily trusted and accepted as the single one. Such "loadbalancing" is in tradeoff to acceptance.

v1thesource commented 6 years ago

With Cryptokitties making up >10% of all tx's currently (https://ethgasstation.info/gasguzzlers.php), the best medium-term solution may be helping them implement a payment channel mechanism. Uncles / total blocks per day is around 21%, which is not disastrous, but is only aggravated by a gas limit increase. If the gas limit is increased, we'd have to tell everyone to wait for more block confirmations per tx to make sure they get on the right chain.

Crazy idea, but at this point it may be worth looking at increasing the target time per block from ~15s. Users will have to wait for multiple confirmations anyway with the current increasing uncle rate, at least with a higher block time interval we can increase the gas limit with a reduced effect on uncle rate.

rolandkofler commented 6 years ago

@dip239 while I see the first part of the idea, the second part "contracts must be audited" is easly circumvented by adding harmless nonsense functions. And it would be lead to batteries of loadbalancer contracts anyway.

AlexeyAkhunov commented 6 years ago

I would like to bring up the Stateless Clients proposal, as I described here: https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9

I am collecting more data now about how much impact it can make, and what is the overhead, hopefully can present something very briefly

ethernian commented 6 years ago

part "contracts must be audited" is easly circumvented by adding harmless nonsense functions

My idea is not perfect, I agree, it is more the way of thinking about the problem: I am just trying to punisch an excessive gas consumption by target contracts instead of gas provisioning by transactions.

Nevertheless my point was, that "loadbalancing" will not working "for free": a careful user needs to trust N contracts instead of single one if their codebase is not identical. Personally I wouldn't trust a bunch of loadbalancing contracts with "almost" the same code: too much to check every single one. But CryptoKitty players possibly do not care about contracts they trust at all.

coinaisseur commented 6 years ago

Whatever solution we adopt, we can all agree that this is an emergency situation that must be solved short term. With the 'accidental' success of CryptoKitties, we can assume there are a bunch of developers coding Ethereum dApps right now as I write this message, so this transaction backlog will only get worse from now.

ethernian commented 6 years ago

Thought more: ... there should be some "central contract" behind the "loadbalancer", coordinating the whole application. We could sum all gas burned in all transactions going through this "central contract" in some time frame (TxGasBurningRate for this contract). If the network is currently overloaded AND some contract is involved into excessive gas burning, all transactions going throw it should be deincentivized by higher gas price.

further discussion is moved to ethereum/research

ghost commented 6 years ago

Might be missing something obvious here: Why do we have a static blocktime target, variable gas limits, and (a more abstract) acceptable uncle rate (which is actually variable). Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target? (or uncle/time rate to keep it fair for miners)

vbuterin commented 6 years ago

Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target?

The blocktime target is flexible as of Byzantium, to keep total rewards roughly constant. See it rising slightly here: https://etherscan.io/chart/blocktime

I personally oppose further blocktime increases. The contribution of the fast blocktime to the total uncle rate is relatively small, and furthermore it's ADDITIVE, not multiplicative, with contribution to uncle rate from capacity. That is:

uncle_rate ~= k1 / blocktime + k2 * gas_per_sec

This is confirmed with bitcoin in Decker and Wattenhofer's 2013 paper, and experience suggests the same is true with ethereum. Right now it's the second term in the sum that is the problem, not the first.

IMO we should consider a few optimizations:

Do another round of increasing gas costs on account-accessing opcodes (BALANCE, EXTCODESIZE, etc), and SLOAD, as that's still our major weak point from the PoV of DoS resistance. I'd recommend SLOAD -> 320, BALANCE -> 800, EXTCODESIZE/CALL/CALLCODE/DELEGATECALL/.... -> 1200. But we should add an exception, that self-calls and calls to precompiles cost only 100.
Some variant of https://github.com/ethereum/EIPs/issues/168 and https://github.com/ethereum/EIPs/issues/169 to alleviate state size growth
Increase the cost of sending a tx by 30000 if it goes to a currently empty account

I also totally support the idea of stateless clients. Right now it actually already is possible to implement without any core protocol changes, as long as miners are stateful. There's also the possibility of a "stateless partially full node" - be a light node by default, but fully (statelessly) verify specific blocks if a trusted server tells you that they're invalid. This gives the security model that you won't accept an invalid block unless BOTH (i) there is an active 51% attack, and (ii) all trusted servers you're connecting to are colluding.

Also, it would make sense to have a much more coordinated benchmarking effort, so we can see what opcodes are currently the slowest, and what can be done to improve their execution speed.

Finally, we should have a poll on where we are at for key scalability-relevant client optimizations. This includes:

Garbage collection
On-disk state caching
State tree pruning
Network compression
Database optimization

tejasriram commented 6 years ago

I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?

RSAManagement commented 6 years ago

Hallo, I would like the foundation to recommend the minimum system requirements to set up an optimal client/full node. This is probably a basic step to mitigate a bit the uncle rate problem, it seem that the hard drive is one of the most important bottleneck given the high number of I/O calls to the database. https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9

vbuterin commented 6 years ago

I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?

Properly incorporating this requires account abstraction, which is going into the sharding spec; I don't think there is yet consensus on how/when it's going into the main chain. Abstraction will also be available for Casper validators.

vbuterin commented 6 years ago

See my comment on stateless client numbers here:

https://medium.com/@VitalikButerin/regarding-bandwidth-requirements-for-stateless-clients-i-can-give-some-precise-numbers-be357fb69b6d

vbuterin commented 6 years ago

I do have a question that I'd like to hear answered as well as possible.

It seems to me that the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth. That is, for example, if we somehow magically knew ahead of time what state tree nodes need to be accessed, and we could make the accesses happen in parallel, then processing speed could be increased greatly.

First, is this true? That is, is it the case that loading 1000 specific state trie keys from the DB in parallel is much faster than doing it sequentially? Second, if so, how much faster?

If there are substantial gains to be made, then there are clever things we can do, like requiring miners to provide a witness specifying what accounts and storage keys get accessed in the block, and additionally it means that there are potential great scalability gains in EIP 648.

AlexeyAkhunov commented 6 years ago

@vbuterin Thanks a lot of the answers! I am still trying to do the full mode sync of geth, and now I hit the road block because my SSD is only 500Gb and doing it on HDD is simply too slow, so I am currently stuck around block 4.5m - 9th of November 2017 :). That is why I am trying to optimise geth a bit. But I have managed to compute the sizes of the witness for the blocks around DoS attacks in September 2016. Very often, the witness would be like 37Mb. I have not analysed yet why.

Regarding your second question about parallel reads from the DB, I also thought about it and I looked at how exactly geth (and parity too) organises the accounts and their storage - I will prepare a blog post on that, because it also explain how I calculated the witness size. I also looked at LevelDB implementation that geth uses to see if there is any gain from concurrent reads. I doubt there is. Because of the way the data is stored, there is no locality, and data even from the nearby trie branches are randomly scattered across the whole database. So reading them in parallel would require loading more LevelDB blocks into memory and seeking them.

pirapira commented 6 years ago

@Souptacular About KEVM, Everett and some of his colleagues will be joining the call. So give me an agenda item: "introduction KEVM (Everett)". It would fit nicely before the YP discussion.

AlexeyAkhunov commented 6 years ago

@vbuterin Actually I take it back - I think there will be improvement in trying to access trie nodes in parallel. Because currently lots of time is spent in navigating down the trie, reading lower level only after the higher one. And that exacerbate the high latency of HDD/SDD. I will definite try that. Another thing we could do is only include parts of the keys in the "witness hint", lets say, only first 8 bytes instead of all 32, and use non-exact seek operation to read from DB. I will look into that too.

@pirapira That is great! I have read KEVM paper after DevCon3 and will be curious to hear the discussion

pkieltyka commented 6 years ago

Im just following the discussion regarding data storage - I highly recommend the embedded db https://github.com/dgraph-io/badger which is a RocksDB implementation in pure Go. It's very robust, tested, and supports concurrent reads, ACID transactions, batching and snapshots. The original RocksDB btw is a fork of LevelDB by Facebook with more concurrency features/tuning - so I expect the work necessary to replace geth's existing use of github.com/syndtr/goleveldb/leveldb to badger will be quite minimal. The benefits: more performance, no more CGO for the db (leaks? call penalty?), and maybe disk space too depending if there is any data compaction in geth's db (to release old unused space from deleted/changed entries), or opportunities for compression.

AlexeyAkhunov commented 6 years ago

@pkieltyka I have encountered BadgerDB yesterday and it looks interesting. Another thing to try, thanks!

5chdn commented 6 years ago

Hey @pkieltyka just FYI, we have massive issues with RocksDB and are currently in the process of replacing it in Parity. https://github.com/paritytech/parity/issues/6280

pkieltyka commented 6 years ago

@5chdn I wasn't suggesting to use RocksDB, I suggested to evaluate Badger, an alternative implementation of a LSM in pure Go, inspired by RocksDB. I don't think that issue applies here.

vbuterin commented 6 years ago

I just synced geth, parity and harmony over the last few days to see how they are handling the load.

Here is my feedback. I ran this on Ubuntu 17.10, with a 512GB SSD with 16 GB RAM; in all three clients I used the appropriate setting to set the cache size to 6 GB.

Parity - the warp sync feature failed outright (never even once downloaded a single chunk), and the client did a full sync. This finished after ~2.5 days (not constant online, there were a few offline periods). The processing speed was sometimes ~25-40 mgas/s, and sometimes ~5 mgas/s (see https://github.com/paritytech/parity/issues/7258). Storage size is 41 GB.
Geth - the client randomly crashed the first couple of times I ran it, and then the third time it managed to download all the block receipts/headers and concentrated on downloading the state, and that time it worked. Took ~8 hours, with a total of ~50 million state objects. When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-6 mgas/s. Storage size is 47 GB.
Harmony - the client successfully did the fast sync, in ~8 hours, with a total of ~60 million state objects (maybe harmony counts contract code as a state object and geth doesn't, or something similar? not sure what is causing the disagreement; both times it synced around block 4.7m). When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-8 mgas/s. Storage size is 25 GB.

Thoughts at first glance:

We should really look into DB optimization
All clients should bump up the default cache sizes
We need to fast sync work more reliably, and particuarly make it not lose progress if the user closes the client halfway through the fast sync process

AlexeyAkhunov commented 6 years ago

Yes, I managed to do the fast sync too. But not the "full" sync mode. Never mind, I have now ordered 4TB SSD, should arrive in a couple of days :)

5chdn commented 6 years ago

@vbuterin yes, the warp issue is a well-known annoyance. https://github.com/paritytech/parity/issues/6372

LefterisJP commented 6 years ago

I have now ordered 4TB SSD

@AlexeyAkhunov Oh damn yeah. Need that too. Any model recommentation?

@vbuterin I have not tried Harmony, but I have similar experience with geth and parity.

One other thing that would be really really nice, but probably quite difficult to achieve, is make it possible to do a sync in an HDD. I have tried to do mainnet syncs in HDDs many times. Fast/warp works fine (after many many retries), but after finishing it an HDD just can't keep up with the network with neither parity nor geth.

AlexeyAkhunov commented 6 years ago

@LefterisJP

Any model recommentation?

I chose Samsung 850 EVO, but cannot recommend it until I use myself :)

make it possible to do a sync in an HDD

I am trying to hack together a version of geth that can do that. That is what I have spent most of my time last few days... Otherwise we would have lost the ability to run full nodes without SSD

cslarson commented 6 years ago

EIP648 (Easy parallelizability) was brought up on reddit and there was some hope there might be some discussion of it during the dev meeting. Where/If it fits into the roadmap would be great to hear.

RSAManagement commented 6 years ago

I would like to add some more observations : 1- It seems to me that the uncle rate is partially related to reaching the block gas limit and to growth of the mempool size. So probably raising carefully the block gas limit could lower a bit the uncle rate. (in the short term)

Question is: How much does it cost in terms of computational stress (time to manage and broadcast a block) the mempool management when the the gas limit is reached ? Is it something to do in this specific area?

2- the actual uncle rate is high (about 26%) but lower than the 33% we reach a couple of weeks ago when the gas limit was 6.7 mil. (now is about 8 mil.).

holiman commented 6 years ago

@pkieltyka yes have been looking into badger, and done some experiments. Orignally, I think a major blocker was that badger did panic on every fault, instead of surfacing errors. IIUC, that's been changed now, and we've done some more experiments. @fjl knows more, here's the first experiment from May this year: https://github.com/fjl/go-ethereum/tree/badger-exp

fjl commented 6 years ago

Badger works, but it's not a lot faster than leveldb. The other thing to keep in mind is that badgers approach (keeping keyspace separate from value space) is only beneficial on SSDs.

pkieltyka commented 6 years ago

@fjl Badger has iterated a lot since 7months ago when you made your badger-exp branch. It’s prob worth upgrading the dep and trying again. True it is optimized for SSDs but worth benching as well on an HDD if that’s an important requirement.

AlexeyAkhunov commented 6 years ago

Just to leave it here. There are 3 things I am trying to do with geth to create an optimised version (that will also hopefully work with HDD):

Disable background miner unless the miner is enabled (currently it is still running)
REMOVED While processing blocks, not to write state to disk in the middle of the block, even for pre-Byzantium blocks. Currently, whenever state.IntermediateRoot() is called, it forces disk write of the trie. At the end of each block, there is a batched write to the disk, only that one should be performed.
Write/read state to/from disk as key-value pairs directly as well as trie-structure, which will require order of magnitude fewer reads

karalabe commented 6 years ago

Geth does not write to disk during transaction processing, it keeps the trie in memory. Only statedb.Commit writes to disk, called once per block.

AlexeyAkhunov commented 6 years ago

@karalabe Yes, you are right, of course. I am removing number 2

axic commented 6 years ago

@vbuterin:

But we should add an exception, that self-calls and calls to precompiles cost only 100.

I like the idea of different cost for self-calling, because there is no cost for loading the code again, but the rate compared to the "regular" call should be determined more carefully since it still needs to handle state changes (applying or rolling back depending on call outcome).

I'd be against subsidising precompiles even more.

AlexeyAkhunov commented 6 years ago

@Souptacular @vbuterin I can do experiments with parallel SSD reads if you want (point 2e)

Souptacular commented 6 years ago

@AlexeyAkhunov That would be great! Thanks.

cdetrio commented 6 years ago

Notes from this call are here https://www.reddit.com/r/ethereum/comments/7khro1/notes_from_ethereum_core_devs_meeting_29_120117/.

tomachinz commented 6 years ago

Can anyone give 1 reason why block gas limits need to be so low? Shouldn't "mining" be 90% useful and only minimal wastage of electricity. I mean gigahashing is the dumbest way to heat this already scorching planet to oblivion ever invented (bless you Satoshi), so a higher block gas limit must surely only help reduce power usage at the expense of miners (screw them anyhow).

I understand that running the transactions likely takes between 0.1 and 2 seconds out of the 17 second block time for Ethereum mainnet. Block gas limit should be targeting somewhere between 50% and 75% time utilisation (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). THAT would be mining.

karalabe commented 6 years ago

Uncle rate.

RSAManagement commented 6 years ago

@tomachinz ethereum is actually having issue to scale on chain because of the time nedeed to validate a new block. The problem seems to be the heavy load of I/O on the hdd/ssd . Geth 1.8.0 could eventually mitigate this issue with a far better DB setup, there are other projects on their way such as TurboGeth ( @AlexeyAkhunov ). We will see (when there will be another tx demand peak) if geth 1.8.0 (and/or other clients) allow more block gas limit. Anyway sooner or later ethereum will hit some bandwidth limit (to maintain decentralization) that imho is much hard to solve (and probably there is some bandwidth problem right now).

Uncle rate is a good measure of the difficulty of ethereum to remain decentralized with increasing use of network resources.

antonio-fr commented 6 years ago

"Block gas limit should be targeting somewhere between 50% and 75% time utilization (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). " So you're proposing to miners they would perform more work for less revenue? Are you personally okay to do work more for less money? This is very good for Ethereum to be heavily used. But on top of bringing viable answers, the solutions to scale up need also to be decentralized. It is much easier to bring fast and simple solutions that are paving the way for a centralized system. A decent ETH node already requires a 32GB machine and 512GB SSD. So for the scaling, please be cautious about the decentralization. The fastest path is to build a centralized database controlled and ran by a few. This is not what to expect from this project.

ethereum / pm