Open geofurb opened 5 years ago
Accessing an individual cluster's addresses takes a very long time and returns an empty list:
IPython console
chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cx = blocksci.cluster.ClusterManager(CUSTOM_CLUSTER_DIR,chain)
len(cx.clusters())
Out[7]: 330464891
clist = list(cx.clusters())
a = clist[6]
a
Out[21]: <blocksci.cluster.Cluster at 0x7f17ebced688>
a.addresses
Out[22]: <blocksci.AddressIterator at 0x7efdf84d07a0>
[x for x in a.addresses]
Out[23]: []
a.type_equiv_size
Out[24]: 125
I've uploaded my bitcoin-data
and bitcoin-clusters
directories here, in case it helps with reproducing the error. You might want to let that run while you're at lunch; it's a 102 GB download, and when you unzip the *.tar.bz2 (which will also likely take forever), it's something like 170 - 180 GB.
Trying to get the size of clusters returns all zero length for all clusters. c.type_equiv_size does not seem to cause this issue. This issue seems to be tied to a behavior where iterating over a cluster takes an exceptionally long amount of time, even for small clusters (e.g. c.type_equiv_size=10). This may be related to #200.
Reproduction Steps
System Information
BlockSci version: 0.5 Using AMI: no Compiled under Ubuntu 16.04 cmake version 3.12.4 gcc/g++ 7.3.0-21ubuntu1~16.04 Anaconda version 3.5.1 (Python 3.7.0) Total memory: 64 GB DRAM, 188GB swap
Dependencies installed: blocksci==0.5.0