Closed cjcobb23 closed 1 year ago
I've been thinking about how to build this, and IMO we should ditch the idea of building a tool and do database verification after the ETL Load phase.
transaction_hash
and state_hash
ledgers_table
Because all of our queries should rely on a ledger_index
, similar to the SHAMap, we will only be able to retrieve ledgers that have been verified in the database (i.e. we know all the correct data is present)
It shouldn't take more than a a couple hundred milliseconds (its two batch fetches and shamap operations in memory), and it fits well with keeping the latest validated ledger in memory.
We would still probably have to build a tool to verify the Clio data format in previous ledger, especially since we plan to parallelize the initial data load.
It's hard for clio to do this, because it needs the SHAMap code, which is part of rippled and not part of xrpl_core. Plus, this would significantly lower write throughput. It slows down throughput a lot to put any reads on the write path, and then we have to do all of the hashing too. The writes currently take less than 100ms, so adding another couple 100ms would be quite a lot. However, I think the code complexity is the bigger issue. It's not clear to me if we can use anything besides rippled to do this actually.
We're actually going to do this in python over the API
This is currently implement as a fork of rippled
(https://github.com/cjcobb23/rippled/tree/verify) but should probably be implemented as a small standalone app if possible. Or even python scripts as suggested above.
Closing for now as there is an implementation in place already. How we choose to redo it later will be covered in separate issues.
We should create a tool that allows one to verify that their clio dataset is correct and complete. This tool should be able to verify state data and transaction data for a single ledger or for a range of ledgers.
One way to do this would be to recreate the SHAMap and then check that the hashes match. Reporting mode does something similar to this when it builds a new ledger. It is unclear how to do this outside of rippled, since we would need the SHAMap and the SHAMap is not available in xrpl_core.