XRPLF / clio

An XRP Ledger API Server
https://xrpl.org
ISC License
60 stars 51 forks source link

Create a tool to cryptographically verify clio's dataset #53

Closed cjcobb23 closed 1 year ago

cjcobb23 commented 2 years ago

We should create a tool that allows one to verify that their clio dataset is correct and complete. This tool should be able to verify state data and transaction data for a single ledger or for a range of ledgers.

One way to do this would be to recreate the SHAMap and then check that the hashes match. Reporting mode does something similar to this when it builds a new ledger. It is unclear how to do this outside of rippled, since we would need the SHAMap and the SHAMap is not available in xrpl_core.

natenichols commented 2 years ago

I've been thinking about how to build this, and IMO we should ditch the idea of building a tool and do database verification after the ETL Load phase.

Because all of our queries should rely on a ledger_index, similar to the SHAMap, we will only be able to retrieve ledgers that have been verified in the database (i.e. we know all the correct data is present)

It shouldn't take more than a a couple hundred milliseconds (its two batch fetches and shamap operations in memory), and it fits well with keeping the latest validated ledger in memory.

We would still probably have to build a tool to verify the Clio data format in previous ledger, especially since we plan to parallelize the initial data load.

cjcobb23 commented 2 years ago

It's hard for clio to do this, because it needs the SHAMap code, which is part of rippled and not part of xrpl_core. Plus, this would significantly lower write throughput. It slows down throughput a lot to put any reads on the write path, and then we have to do all of the hashing too. The writes currently take less than 100ms, so adding another couple 100ms would be quite a lot. However, I think the code complexity is the bigger issue. It's not clear to me if we can use anything besides rippled to do this actually.

cjcobb23 commented 2 years ago

We're actually going to do this in python over the API

godexsoft commented 1 year ago

This is currently implement as a fork of rippled (https://github.com/cjcobb23/rippled/tree/verify) but should probably be implemented as a small standalone app if possible. Or even python scripts as suggested above. Closing for now as there is an implementation in place already. How we choose to redo it later will be covered in separate issues.