ethereum-optimism / optimistic-specs

Optimistic: Bedrock, is a protocol that strives to be an extremely simple optimistic rollup that maintains 1:1 compatibility with Ethereum
MIT License
168 stars 36 forks source link

chain history should be unified to the beginning #297

Open saurik opened 2 years ago

saurik commented 2 years ago

So, I have been trying to set up an archive node for work on a chain explorer / analysis engine recently, and in addition to the inability to replicate the current genesis state I reported on Discord I have noticed something deeper about Optimism that I truly believe needs to be considered a bug: the premise of "regenesis" acts like a "hard spoon" of the network instead of a "hard fork", and will break dapps that rely on history.

So like, normally, the way chains work is they do a "hard fork" of their chain occasionally to correct issues or make changes, but the history is then unified going back through the earlier states. The code is adjusted to be like "after this block, we change the rule to X" or "at this block, we made a bunch of one-off state changes". The result is that you can still verify--through all the changes--back through to the genesis block.

Optimism has instead taken a path where they have at least once (and I am pretty sure at least twice) throw away the history of the chain and then just take the current state as a snapshot for what is effectively a new, unrelated chain. This might seem equivalent, and it sort of is from the perspective of the on-chain contracts, but it is NOT equivalent from the perspective of the full stack of software that uses those contracts as they normally have access to that history.

So what has happened is that if you not only go on the block explorer and attempt to go back in time, everything past November 11th is missing... if you use the JSON/RPC API to query for logs that were emitted before November 11th those also would not show up! Logs are a critical part of the platform, and are used by dapps for any number of bookkeeping and history displays :(... but if you have a contract on Optimism that is deployed before November 11th (which I thankfully did not) all of its logs are effectively--to your dapp using the API--now gone.

Meanwhile, even as someone attempting to do analysis and willing to put in a bunch of legwork behind the scenes, this is still extremely frustrating, as it seems like the code to even load the state snapshots for the prior regenesis has been removed from current versions if l2geth, so I guess I am going to have to run an old (static / dead) version of your software to get access to all of this data (and am having to do a lot of "software archaeology" to figure out how to even reconstruct these past states correctly).

What I claim you should instead do is rebuild all of your chain state again... "once and for all", as it is clearly all still there on L1. Any "surgery" you did to the states shouldn't then be an external program written in TypeScript which munges a JSON file to load into a new chain: it should be in l2geth so that it can be part of the consensus on a unified chain history that goes back through to when you launched. This way, everything would work "as expected"--as you are trying to claim EVM compatibility--and would be more maintainable going forward.

karlfloersch commented 2 years ago

Thank you so so much for this incredibly thoughtful issue! I absolutely agree that regenesis is terrible and shouldn't be done anymore. We've recently publicly committed to never regenesis-ing again here - https://optimismpbc.medium.com/all-gas-no-brakes-8b0f32afd466 - so it won't happen again!

The main tradeoff which we've had to consider, & the reason why we've opt-ed for regenesis, is that the node software has gone through a number of pretty significant iterations as we've improved our fraud proof tech over time. This means that in order to support old versions of the software, we'd have to incur a lot of tech debt. However, now that the chain is beginning to stabilize we recognize that removing the ability to sync historical data is simply unacceptable.

Even though we will never regenesis again, there are some open discussions on exactly how to perform upgrades in the future and feel free to follow them or chim in here! - https://github.com/ethereum-optimism/optimistic-specs/discussions/94 . We're still in early stages of discussion and will have to iron out lots of specifics & are definitely going to take in all of your awesome feedback!!!!

Regenesis is a bug

saurik commented 2 years ago

@karlfloersch The problem is that the history actually is part of the state, and so your current chain now just has a massive scar that has potentially-permanently broken everything deployed: while a contract on chain can't directly access this history, the dapps that developers build to use these contracts can and do. You thereby need to treat this as not just "something we shouldn't do again" but something you need to retroactively go back and fix (unless of course you just want to treat the entire network as it currently stands like a testnet and then re-deploy a good version starting at a clean slate again in the future... but I doubt that's on the table for you).

In my case, if my dapp had been deployed before your "regenesis" (which thankfully we had not), then an accounts that were created before that point would simply not be findable in the UI, and even if the user were to manually have entered their account identifier all of the transaction history for the account wouldn't display correctly. This is because the way dapps cheaply access account change state is using the EVM logs feature. (You can't use transactions, as your contract might have been called by another one, and using storage would be extremely expensive for UI state.) I'd be receiving complaints from my users about this until the end of time.

Meanwhile, your users are clearly upset that the history is missing in things like the block explorer and are bringing it up constantly on Discord, which means you haven't even solved your concern about "incur[ring] a lot of tech debt", as the people working on block explorers are still going to have to be able to access old history state and so we're having to play software archaeologist to figure out what you did :(. You should take note that other blockchains--including Ethereum itself--despite having made tons of changes that require littering their code with backwards compatibility kludges that activate and deactivate at various points, have a unified history.

(And so, to be clear about it: while I haven't yet analyzed if a "squash fork" as you are calling it would work or makes sense--and it very well might: the key thing might just be maintaining the transactions and receipts--you need to not just do it having a unified history going back to November 11th 2021... you need to first rebuild the state going back to whenever you actually started this project, connect up all of those blocks correctly, and then build a resulting unified chain that you can use as the new beginning for your "squash". Otherwise, you are just dooming all of that state and all the people who adopted your chain before November 11th to confusion.)

saurik commented 2 years ago

WOW, by the way... I just read that EIP-4444, and the JSON-RPC changes section is extremely unfortunate :/. Based on some of the earlier paragraphs, I'd assumed that the premise was that for JSON-RPC requests to old data it would require manually fetching the older state, but they really are just intending to kind of sluff it off? :(

After this proposal is implemented, certain JSON-RPC endpoints (e.g. like getBlockByHash) won’t be able to tell whether a given hash is invalid or just too old. Other endpoints like getLogs will simply no longer have the data the user is requesting. The way this regression should be handled by applications or clients is out-of-scope for this proposal.

I guess the thing I'm most confused by is that I can't imagine that the block headers and receipts are taking up that much space... I'd believed (though I currently don't know: I'm intending to learn this as I finish building out the fleet of archive servers for various projects I've been working on this past week) that the vast majority of the space was being taken up by the state trie (which they are not trying to drop, at least not in this proposal)... but they seem to be claiming that almost half the space is being taken up by the headers and the receipts?

(FWIW, your decision does make a lot more sense in the context of this proposal to Ethereum. I still feel like you should at least make canonical decisions with respect to how that old data is represented in the context of your full historical state, however, because without such people who build block explorers are left in a lurch and are going to have to make somewhat arbitrary and potentially incompatible decisions.)

saurik commented 2 years ago

@karlfloersch OK, I've come up with a way to describe the issue I'm still seeing: in the Ethereum EIP-4444 proposal, nodes can technically still have all of the block headers and receipts and be able to respond to those queries as the history is intact... they merely can't reproduce it by re-executing the state. With the current status of Optimism (looking backward, not forward)--without taking the time to go through and do a final pass of unifying all of the history into a single unified timeline--that wouldn't work: the history is just gone, not static.

karlfloersch commented 2 years ago

With the current status of Optimism (looking backward, not forward)--without taking the time to go through and do a final pass of unifying all of the history into a single unified timeline--that wouldn't work: the history is just gone, not static.

Yes agreed this is a significant issue & we won't be regenesis-ing in this way again. I do wonder if there's something we can do with the old regenesis data that would make it a bit more accessible. The tricky bit is that the block numbers reset after every regenesis, which is going to make querying this historical data very annoying. Not sure if there's an elegant fix for this historical data to be packaged within default node software (open to suggestions), but we're 100% on the same page that erasing history in this way will never happen again.

saurik commented 2 years ago

@karlfloersch FWIW, my suggestion here is actually to do another "re-genesis" (but not really: in some sense, the opposite) with the goal of building a unified history that ends with a new chain that includes all of the blocks ever going back to actual clean-ish starting genesis block. This would mean all of the blocks from Optimism 5 would have block height numbers that are much larger than they currently do, as they would all get re-numbered after the Optimism 4 blocks, which in turn would be re-numbered to come after the Optimism 3 blocks. The idea is to sit down and actually work through what the history should have been if this project had had a consistent history from the beginning, in some sense erasing all of the prior re-genesis events (and so this isn't actually a new re-genesis, it is more of an un-genesis).

This would feel a bit complex when it comes to the NUMBER instruction, as semantically it would have been returning a value different than the block it was actually in, but I'm not actually sure if that instruction worked before and frankly if it did you've already broken that horribly: that value should, at minimum, be monotonically increasing, and yet contracts that are already deployed are going to have seen the same number multiple times. (And somehow your chain even seems to have had timestamps go backwards in time during Optimism 4, so there are probably contracts that have horribly-broken overlapping state due to all of this mess). I'd just say "there were multiple sequential periods in the early chain history where the NUMBER instruction returned values with an offset from the final unified block number".

karlfloersch commented 2 years ago

@saurik this is a great idea!

I think one way to fix this near term without doing all of this historical work up front could just be that during the next upgrade we increase the block number to be the "proper" block number so at least from that point forward we start creating blocks at the "proper" height. It would make creating this new frankenstein node wayyyy easier in the future.

I'm pretty bullish on us increasing the block number during the next upgrade. Are there any other protocol changes that we need to make now that would make creating the frankenstein node easier in the future? I can't think of anything major other than increasing our block number.

Btw thank you for all of your incredibly thoughtful design suggestions!

smartcontracts commented 2 years ago

Moving over to optimistic-specs for further discussion

norswap commented 2 years ago

I completely agree renumbering blocks was a terrible idea. I'm sure some people were burned by this (haven't heard about applications, but at the very least infra providers were hampered by it, and a lot of people got very distraught at tax filling time).

That being said, renumbering blocks is still a terrible idea, and so shifting the blocks in the [re-genesis, Bedrock] interval forward when we switch to Bedrock seems very harmful to me, even more so now that the chain usage has increased a lot.

We can't fix harm done in the past by causing more harm in the present, and I think whatever problems apps had with this in the past, they've already worked around. There is little to gain in this renumbering, and a lot to lose imho.

I do think we need a good solution to make the previous blocks accessible through a simple familiar API. I think the daisy-chain RPC server could provider an answer.

The fundamental difficulty is that we now have duplicate blocks with the same number (pre-regenesis-era and regenesis-era). I think we could fix this by adding an optional flag to every JSON-RPC request to the RPC server which says if the number is to be interpreter as pre-regenesis or post-regenesis (defaulting to post-regenesis if absent).

Alternatively, we could have a switch (off by default) that turns on "shifted numbering": where numbers < REGENESIS_HEIGHT are treated as pre-regenesis-era numbers, and numbers > REGENESIS_HEIGHT are shifted down by REGENESIS_HEIGHT and then treated as post-regenesis-era numbers.

@tynes Does that seem feasible / a good idea? @K-Ho For confirmation that a new block renumbering is a terrible idea / user-side perspective.

@karlfloersch Do we actually have two regenesis in the history? Did we renumber blocks more than once?

norswap commented 2 years ago

@saurik You probably figured this out by now, but just in case:

I'm intending to learn this as I finish building out the fleet of archive servers for various projects I've been working on this past week) that the vast majority of the space was being taken up by the state trie (which they are not trying to drop, at least not in this proposal)... but they seem to be claiming that almost half the space is being taken up by the headers and the receipts?

On L1, it's about half-half, assuming the state is fully pruned (so no nodes belonging to old state tries, but not also to recent-ish state tries (have to keep those to be able to handle re-orgs efficiently)).

The most recent figure I heard is about ~150GB for each of those. (And if you want to store the state unmerkleized, that's about 40GB).

saurik commented 2 years ago

I mostly just want to remind that if you expect every single dApp and every single tooling provider to add weird special cases for your RPC server I think you are overestimating your influence (in a way not dissimilar to assuming everyone will recompile or even re-code their contracts to run on OVM 1.0). For all those people trying to understand their full transaction history using some off-the-shelf tax tooling or working with some block explorer, they are going to care that the status of the software and system as of when they go to use it works correctly, not that there was some memory of pain in the past. I 100% agree that fixing this is painful, but until you fix it--until the end of time--everyone has to deal with the pain of the system not being fixed: I thereby personally think the goal should be to "actually fix" the system, once and for all.

norswap commented 2 years ago

I don't think this is as much of a problem as you think it is.

Apps that are currently running on Optimism have already contended with the issue of the re-genesis, if they have it at all (in my experience, it's only a small minority of dapps that offer a transaction history, for instance). Ironically, re-numbering again might break their fix. Post-regenesis apps don't have any problem.

I think you'll be interested though by this snippet from this hot-of-the-presses Vitalik piece:

But as it turns out, the Bloom filter mechanism is too slow to be user-friendly for dapps anyway, and so these days more and more people use TheGraph for querying anyway.

It is my impression (not hard evidence to show) that besides using the Graph, it's likely that some dapps maintain their own transaction DB. Renumbering post-regenesis blocks would mean breaking all of these databases.

Crypto tax software requires a lot of painstaking manual encoding (to understand the nature of dapp transactions, understand the various bridges, etc). I expect this not to be a problem for them (and totally expect us to exert our god-like influence to get them to handle this quirk ;)).

Anyhow, I can agree to disagree on this. I think the path forward is to ask users directly. (Tagging @K-Ho, who is the angle bringing us the sacred word of the users.)