citp / BlockSci

A high-performance tool for blockchain science and exploration
https://citp.github.io/BlockSci/
GNU General Public License v3.0
1.34k stars 259 forks source link

AMI: default disk size of 500GB is insufficient as of August 2019 #302

Open jiagengliu opened 5 years ago

jiagengliu commented 5 years ago

Reproduction Steps

Using r4.2xlarge with the AMI supplied in https://citp.github.io/BlockSci/readme.html. After launching run the following script:

import blocksci
home_dir = "/home/ubuntu/"
data_path = home_dir + "bitcoin"
cluster_path = home_dir + "bitcoin/clusters"
chain = blocksci.Blockchain(data_path)
no_change_heuristic = blocksci.heuristics.change.legacy() - blocksci.heuristics.change.legacy()
cm = blocksci.cluster.ClusterManager.create_clustering(cluster_path, chain, no_change_heuristic)

After one day, the root disk will be filled up, df returns

/dev/xvda1  ... 0% ...

System Information

Using AMI: Yes BlockSci version: 0.5 Blockchain: Bitcoin Parser: Disk/RPC Total memory: 61 GB

jiagengliu commented 5 years ago

It seems that the default choice of disk space is 500 GB. I wonder if that may not be enough for BlockSci?

mplattner commented 5 years ago

You are right, 500 GB is probably not enough anymore for using BlockSci on Bitcoin. My fully synced Bitcoin node has 284GB of data, the parsed BlockSci data is additional ~200GB (I don't have a fully parsed Bitcoin chain available.), so with clustering you easily need more than 500GB I guess.

Regarding the AWS instance: I can't find default disk size settings for the r4.2xlarge instance. Isn't it defined by the user when creating the instance and its EBS volume?

Anyway, a note about the disk space requirements should be added to the docs.

maltemoeser commented 5 years ago

I believe the default disk size is based on the disk size of when the AMI was created, I wasn't able to find a way to increase it for the existing AMI. I'll make sure to add a warning to the docs that a larger disk size should be chosen.

@jiagengliu you can increase the disk size using this guide

maltemoeser commented 5 years ago

Added in 2a597a4fd3cafe080078b6c8ed6bb05d094bccb0

I'll leave this open for a while since it might affect other users too

jiagengliu commented 5 years ago

Thank you @maltemoeser and @martinplattnr. Let me rephrase it: when creating your AMI from the EC2 image in the readme file, do NOT click the blue "Review and Launch" button right away. Instead, proceed with the configuration and change the size of the root volume to something above 700 GB.

jiagengliu commented 5 years ago

@maltemoeser I didn't notice the issue until I failed to save my notebook. It may be also a good idea to warn the user when the parser is about to run out of disk space.

maltemoeser commented 5 years ago

@jiagengliu having a warning would be nice indeed. However I doubt that many users will check the parser logs, so putting a warning there will probably largely go unnoticed.

jiagengliu commented 5 years ago

Maybe it's also a good idea to hint users to check parser logs in the documentation as well. I didn't know about the log and have only checked the process monitor (top) to get a sense of what's going on.

maltemoeser commented 5 years ago

I've added a warning on v0.6 in the Python interface, that's probably how most users interact with BlockSci anyways. I've also started a list of useful warnings for the parser in #293.

Haaroon commented 5 years ago

Please can you guys update the AMI.

trekianov commented 4 years ago

Does anyone have a rough idea about the cost of running the AMI and having it updated the bitcoin data directory? I am struggling with my local machine (it will take about 40 days to update) and I am searching for a plan B.

jiagengliu commented 4 years ago

@trekianov Referring to #2, you could try starting an AMI and download the parsed data to your local machine to start analysis instead of maintaining a full node on your machine.

jiagengliu commented 4 years ago

@maltemoeser I have a dumb follow-up question: once we are done with parsing, is it safe to delete the original blockchain database (usually ~/.bitcoin)? Thank you!

Haaroon commented 4 years ago

@maltemoeser I have a dumb follow-up question: once we are done with parsing, is it safe to delete the original blockchain database (usually ~/.bitcoin)? Thank you!

Hey once you have parsed the data into the blocksci format it's perfectly safe to delete the original .bitcoin folder. The blocksci analysis will still work as it uses its own format.

But when .bitcoin is deleted you won't be able to parse new blocks to update blocksci without the original .bitcoin folder.