citp / BlockSci

A high-performance tool for blockchain science and exploration
https://citp.github.io/BlockSci/
GNU General Public License v3.0
1.34k stars 259 forks source link

Calculating the average age of utxos within age bins #329

Closed simonohanlon101 closed 5 years ago

simonohanlon101 commented 5 years ago

Hi there, I'm trying to adapt this gist from @hkalodner to calculate the average age of UTXOs within each age band (rather than the proportion of the total supply in each age band at each block).

I really liked the net difference approach, and assumed it would be relatively trivial to apply it to calculate the ages, but I'm going wrong somewhere and I can't see where. Hoping for some help here.

I adapted the original calculateImpl () function which was to calculate net change in balance at each block, and my new function is supposed to calculate the net difference in the count of utxos in each age band at each block. We use the calculateNetMulti() function from that gist to apply this function in a map/reduce manner returning the cumulative sum, which should give us the count of utxos in each age band at any given point in time. I thought, the cumulative sum of this (as each utxo that exists from one block into the next has aged by a block), divided by the count should give the average age in blocks, but I'm going wrong somewhere. Hoping someone can see where!

def calculateAges(blocks):

    totals = np.zeros((len(chain), len(buckets)))

    for block in blocks:

        never_spent = np.sum(block.outputs.unspent.value > 0)
        spent_outputs = block.outputs.spending_tx.has_value
        output_heights = block.outputs.spending_tx.block_height.with_value
        output_ages = output_heights - block.height
        age_sort = np.argsort(output_ages)
        sorted_ages = output_ages[age_sort]

        if sum(spent_outputs) > 0:

            sorted_values = output_values[age_sort]
            cuts = np.searchsorted(sorted_ages, cutoffs, side="left")
            out = np.add.reduceat(np.concatenate((np.ones_like(sorted_ages,dtype=int), np.zeros(1, dtype=int))), cuts)
            stop_points = np.empty_like(cuts)
            stop_points[:-1] = cuts[1:]
            stop_points[-1] = len(sorted_values)
            out[cuts == stop_points] = 0

            for i, cut in enumerate(zip(cuts[:-1], cuts[1:])):
                np.subtract.at(totals, (output_heights[age_sort][cut[0]:cut[1]], i), 1)
        else:
            out = np.zeros(len(cutoffs))

        out[-1] = never_spent
        out = out[::-1].cumsum()[::-1]

        for i, (start, end) in enumerate(buckets):
            if block.height + start < len(chain):
                totals[block.height + start, i] += out[i]
                if i > 0:
                    totals[block.height + start, i - 1] -= out[i]
    return totals

System Information (if applicable)

Using AMI: yes BlockSci version: 0.5 Blockchain: Bitcoin Parser: Disk Total memory: 128 GB

simonohanlon101 commented 5 years ago

Since I only needed to calculate the state at the end of each day, rather than for each block, I went for a more brute force approach in the end. The first task was to create a 3 column table for each and every output with a value > 0, the first column containing the block where the output was spent, and the second column containing the block at which that output was spent. Last column contains bitcoin value. From that, we can work out, for the last block in each day, what the average age of unspent outputs was, binned into different age buckets:

def getAges(blocks):

    utxos = []

    for block in blocks:

        # update progress to screen for sanity
        #if block.height % 10000 == 0:
            #print(f'Processing block:\t{block.height}')

        output_heights = np.array(block.outputs.spending_tx.block_height.all, dtype='float64')
        output_heights = output_heights.reshape(len(output_heights),1)

        output_values = block.outputs.value / 1e8
        output_values = output_values.reshape(len(output_heights),1)

        created_at = np.full(output_heights.shape, fill_value = block.height, dtype='float64')

        result = np.concatenate( (created_at, output_heights, output_values), axis = 1)
        result = result[ result[:,2] > 0 ]
        utxos.append(result)

    utxo_ages = np.concatenate(utxos)
    return utxo_ages

One issue with this approach when using it in a mapreduce framework, is that the pickled objects returned can be quite large .- larger than the maximum allowed. To get round this, just use more cores, to split the chain into smaller block ranges:

def calculateAgesMulti(chain, func):

    def mapFunc(blocks):
        return [func(blocks)]

    def reduceFunc(accum, new_val):
        accum.extend(new_val)
        return accum

    parts = chain.mapreduce_block_ranges(mapFunc, reduceFunc, cpu_count=24)

    return np.concatenate(parts)

Then run: result = calculateAgesMulti(chain, getAges)