Move mempool data out of parser directory

citp / BlockSci

A high-performance tool for blockchain science and exploration

https://citp.github.io/BlockSci/

GNU General Public License v3.0

1.34k stars 259 forks source link

Move mempool data out of parser directory #332

Open maltemoeser opened 5 years ago

maltemoeser commented 5 years ago

[ ] Move mempool data outside of the parser data directory to prevent accidental deletion when reparsing the chain.
[ ] Add instructions on re-parsing to the FAQ/documentation

maltemoeser commented 5 years ago

Alternatively: add a command blocksci_parser reset that empties all directories but the mempool folder, with an optional flag to also remove mempool data

mplattner commented 5 years ago

That's a good idea. I think a separate directory that stores mempool data globally might be the better option. Then multiple parsed versions of a chain (eg. at different block heights) can use the same mempool directory, without having to keep multiple directories updated or running multiple mempool recorders.

However, both options should be fairly easy to implement.

Related idea: It might be helpful for (new) users to offer mempool data to download, as this is something that can't simply be re-created.

maltemoeser commented 5 years ago

@martinplattnr going a step further, I think the current deep integration with BlockSci is not ideal, as it means that in order to record mempool timestamps you need to run a server that can run BlockSci 24/7, which is quite costly. Furthermore, the data format can't easily be parsed and may not even be reusable across different machines (#2). Ideally, there would be a lightweight client recording transaction timestamps, and then a tool that converts these into the optimized data format for BlockSci.

mplattner commented 5 years ago

Yes, you are right, the current implementation is not ideal. A simple Python script that connects to a Bitcoin node and logs <BlockHash, Timestamp> and <TxHash, Timestamp> pairs, and an importer, may be enough. I added some thoughts related to this and #2 in #2 itself.