Parsing does not continue with already parsed blocks

citp / BlockSci

A high-performance tool for blockchain science and exploration

https://citp.github.io/BlockSci/

GNU General Public License v3.0

1.34k stars 259 forks source link

Parsing does not continue with already parsed blocks #340

Closed cgebe closed 5 years ago

cgebe commented 5 years ago

I have started parsing the blockchain with:

blocksci_parser --output-directory /home/BlockSci/Notebooks/data update disk --coin-directory /home/btcd/.bitcoin

I stopped the process on block 506011 and continued the parsing with the same command. However, it now starts all over at block 1:

Removing 0 blocks
Adding 603000 blocks  
0.00% done, Block 1/603000

/home/BlockSci/Notebooks/data contains all data previously parsed: chain hashIndex mempool parser scripts

How can i continue the parsing at block 506011?

maltemoeser commented 5 years ago

Interrupting the parser quite likely leads to data corruption (#33). If you don't want to parse the full chain, you can use --max-block xxxxxx to specify a maximum height (or a negative index to stop x blocks ahead of the chain tip)

cgebe commented 5 years ago

@maltemoeser okay thanks for the hint. Until block 500k, the parser performs well, however the speed drops extremely thereafter, likely due to transaction count / block. That's why i interrupted and tried to restart the machine. So it currently makes sense to stepwise parse the chain --max-block 400000, --max-block 500000, --max-block 600000? It probably also makes sense to backup successfully parsed chain segments.

mplattner commented 5 years ago

@cgebe Yes, the parser seems to slow down. This is, as you said, due to higher transaction counts per block towards the end of the chain.

If you want to parse the entire chain I suggest to start the parser with --max-block -6. This parses upto -6 blocks from the chain tip. (A negative number is useful to avoid chain re-organizations.) Of course it should also work fine to start the parser iteratively and pass an incrementing (eg. 400000, 500000 etc.) max-block setting if you want to create backups. However, this is not a requirement.

cgebe commented 5 years ago

I'll try to iteratively parse the chain now and backup the data at certain checkpoints. This also saves me from parsing again if the hourly cronjob fails unexpectedly. Thank you for the help, great project! Cheers!