citp / BlockSci

A high-performance tool for blockchain science and exploration
https://citp.github.io/BlockSci/
GNU General Public License v3.0
1.34k stars 259 forks source link

Zero-length clusters #219

Open geofurb opened 5 years ago

geofurb commented 5 years ago

Trying to get the size of clusters returns all zero length for all clusters. c.type_equiv_size does not seem to cause this issue. This issue seems to be tied to a behavior where iterating over a cluster takes an exceptionally long amount of time, even for small clusters (e.g. c.type_equiv_size=10). This may be related to #200.

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

Reproduction Steps

import blocksci
import blocksci.cluster

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cm = blocksci.cluster.ClusterManager(BITCOIN_CLUSTER_DIR, chain)

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

System Information

BlockSci version: 0.5 Using AMI: no Compiled under Ubuntu 16.04 cmake version 3.12.4 gcc/g++ 7.3.0-21ubuntu1~16.04 Anaconda version 3.5.1 (Python 3.7.0) Total memory: 64 GB DRAM, 188GB swap

Dependencies installed: blocksci==0.5.0

geofurb commented 5 years ago

Accessing an individual cluster's addresses takes a very long time and returns an empty list:

IPython console

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cx = blocksci.cluster.ClusterManager(CUSTOM_CLUSTER_DIR,chain)
len(cx.clusters())
Out[7]: 330464891
clist = list(cx.clusters())
a = clist[6]
a
Out[21]: <blocksci.cluster.Cluster at 0x7f17ebced688>
a.addresses
Out[22]: <blocksci.AddressIterator at 0x7efdf84d07a0>
[x for x in a.addresses]
Out[23]: []
a.type_equiv_size
Out[24]: 125
geofurb commented 5 years ago

I've uploaded my bitcoin-data and bitcoin-clusters directories here, in case it helps with reproducing the error. You might want to let that run while you're at lunch; it's a 102 GB download, and when you unzip the *.tar.bz2 (which will also likely take forever), it's something like 170 - 180 GB.