citp / BlockSci

A high-performance tool for blockchain science and exploration
https://citp.github.io/BlockSci/
GNU General Public License v3.0
1.34k stars 260 forks source link

RPC Parser can run out of memory #262

Open maltemoeser opened 5 years ago

maltemoeser commented 5 years ago

Parsing Bitcoin Core over RPC runs out of memory during fetching of block headers.

blocksci_parser rpc-btc.json update

> Locking data directory.
> 100.0% done fetching block headers

> fish: “blocksci_parser rpc-btc.json up…” terminated by signal SIGKILL (Forced quit)

Output of /var/log/syslog:

Apr 13 12:57:29 citp-blocksci kernel: [519699.701563] Out of memory: Kill process 18709 (blocksci_parser) score 779 or sacrifice child
Apr 13 12:57:29 citp-blocksci kernel: [519699.701595] Killed process 18709 (blocksci_parser) total-vm:54629304kB, anon-rss:52755136kB, file-rss:0kB, shmem-rss:0kB
Apr 13 12:57:31 citp-blocksci kernel: [519701.801718] oom_reaper: reaped process 18709 (blocksci_parser), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

System Information

BlockSci version: v0.6 Using AMI: no Total memory: 60 GB

maltemoeser commented 5 years ago

Currently the RPC parser uses the getblock command and fetches the whole block, not just the header (getblockheader was added in Bitcoin Core v0.12). It then keeps all that data in memory, which exceeds the memory on our machine.

In the future we may want to patch the Bitcoin CPP API to include the getblockheader command and do the initial block header fetch with block headers only, and then lazily load the information from the node if we need it for the parse.