ethereum / glados

Portal network monitoring application.
60 stars 26 forks source link

Initial state-root based auditing strategy for state network. #235

Open pipermerriam opened 4 months ago

pipermerriam commented 4 months ago

What is wrong

We need an auditing strategy for state network.

Ideally, the strategy avoids some of the inherent complexity such as:

How can it be done.

I propose an initial strategy that is based on state roots.

The only initial data that would need to be loaded into the database would be state roots.

The strategy would perform random walks of the state data beginning at the state root and terminating when either data is determined to be unavailable or the walk terminates in a leaf node.

The walk begins with a state root, which can be used to retrieve the trie node associated with the state root. At each stage the following logic would be applied.

This strategy will have some shortcomings, but it should be simple to implement and it will be a viable jumping off point for more detailed and thorough auditing strategies.

There are a few choices that should be considered:

morph-dev commented 2 months ago

If the node is a leaf node and the account is a contract: Fetch the contract storage trie state root node and continue.

Just realized that this doesn't work.

In order to iterate contract storage trie, we have to know the address of the smart contract (in order to construct the content key). However, if we discovered smart contract account in the account trie using steps from above, we don't know its address. We know its path from the root, but path is actually keccak(address), and there is no way to reverse it.

To work around it, we would have to build a path->address map for smart contracts.

Alternatively, we can decide on a smart contract address (e.g. picking random transaction that was smart contract call), iterate account trie to find its storage trie root, and randomly walk the storage trie as already described.

pipermerriam commented 2 months ago

Simplest solution to this is to store the pre-image map in the network which I think is a reasonable way to do this, except that where to source the pre-images is the difficult part. I think that means that our bridges need to produce this mapping as part of their data export since I think the only reliable way to get 100% coverage on the pre-images would be to source them from tracing the EVM execution?

cc @KolbyML

morph-dev commented 2 months ago

Pre-images are needed for Merkle -> Verkle transition, so Verkle team was already looking into this. I believe that some clients (not sure if RETH is one of them) already stores pre-images (or at least they can with a flag).

With that in mind, Verkle team might already have pre-images, or at least have a way to generate them easily.

KolbyML commented 2 months ago

It is very trivial to get the pre-image (aka the address for the account). We already get it, so it would be like 3 lines to return it so we can export them.

I think the only reliable way to get 100% coverage on the pre-images would be to source them from tracing the EVM execution?

We don't have to do anything special, as the REVM returns all the pre-images on each execution of a transaction.

I am assuming 100% coverage would be any address which account was updated during the execution of a transaction. If you mean any intermediate hashes which changed when an account was updated in the MPT then we still have to write code to detect that, but it should be trivial either way. It shouldn't be too hard no matter what happens I guess is what I am saying.

pipermerriam commented 2 months ago

Would anyone like to do the work to propose a spec change for this for the existing state network spec?

I think that we likely need to

The major complexity here is that I don't believe there is an easy way to define a canonical anchoring proof and there are potentially infinite valid anchoring proofs for each mapping, which leads us to an awkward situation on how best to have clients be able to reconstruct anchoring proofs for re-gossiping of these payloads....

pipermerriam commented 2 months ago

After the face-palming realization that this is just an issue in glados for how we do auditing, worth noting that this isn't something we'll want to modify the spec for at this time, but it still begs the question of how we can effectively audit this data.

pipermerriam commented 2 months ago

For the short term, I think that implementing the strategy outlined in this issue, but simply ignoring the parts that dig into contract state are probably the right way forward here.

KolbyML commented 1 month ago

There are a few choices that should be considered:

Upon encountering trie nodes, should we save them to the database? If we do, then we can then perform individual audits of that data directly. Upon failure: what data should we save? Upon success: what data should we save?

Thinking about this. To my understanding the state is enormous so I wouldn't want to save all trie nodes to the database as that would be excessive.

the differences between hashmap and tree type portal networks in terms of auditing

I think the auditing schemes for different networks can be distinct for example history/state auditing are 2 different models for auditing content and 2 different kinds of networks. One is a hashmap and one is a MPT (both can been seen as key-value stores, but auditing both kinds of networks the same wouldn't make sense as it would end up being impossibly in-efficient in case of MPT network type auditing). Since all of our current database models are built around history type auditing, I think it may end up making sense to make tree-network type (example the state network) database tables to better deal with the different auditing structures between the 2 different network types. Requirements, constraints, etc

my basic initial understanding of how tree/state network type auditing works

The auditing scheme just sounds like Depth First Search to me (but the difference is we want to explore the whole tree, not find 1 node.) So I would want to use an in memory stack. In the stack we would store content-keys of paths we found.

what should we save to the database?

I think we should only store

I don't think we should save trie nodes unless they failed, or are a successful leaf as state is massive and storing the trie nodes is redundant because it can be assumed they are already stored. Doing it this way we can also assuming we store the trie node hashes we can reconstruct all the valid nodes we traversed anyways, which is another reason it doesn't make sense to store them to my understanding.

Another note is I would want to know the block number each successes/failure is from.

So maybe we could have a table which just contains state roots for all blocks, then another table which links so maybe something like

|table: state roots| | id (block height) | state root |

| table: state audits | | id | (table: state roots): id (block height) | content-key | trie node hash | status (success/fail) |

This is nice because each state root has a unique block height so we can use it as a primary key. I could be missing something, but I think this is a clean layout maybe a few minor details can be changed, but yeah.

conclusion

I guess I wrote this to get some of my thoughts down. From (reading code)/(researching the task at hand) state network auditing (or I guess auditing different portal networks) I will have to do a small refactor to the audit code, as just adding this to the list of audit strategy in the current way the code is layed out just tastes wrong.

of course as my understanding changes from learning more my decision can/will change. I am going to just bang out the basic walking the tree audit thing right now

KolbyML commented 1 month ago

When I originally read the issue, my impression was we were going to be doing full walks of a state tree, but I think I misread it and it is just random walks of trees, well that is easier so it isn't the worst xD

pipermerriam commented 1 month ago

I think random single walk is a nice/simple starting point.

Future iterations might include walking a range of the keyspace which I think benefits some from the efficiency of re-use of much of the proof data to sample a much wider range of keys.

It's a massive data set. I suspect that doing effective long term auditing of it is going to require some iteration cycles and probably running a much beefier fleet of nodes for glados to make use of.

pipermerriam commented 1 month ago

Btw, your analysis on what to save in the database from above is fine. I agree that it may be overkill to store all of it, at least at this stage.

KolbyML commented 1 month ago

I am done implementing it https://github.com/ethereum/glados/pull/259 The PR is ready I just need to wait for Mike to merge his PR so I can rebase, then I will open it for review