gcarq / rusty-blockparser

Bitcoin Blockchain Parser written in Rust language
GNU General Public License v3.0
370 stars 145 forks source link

unspentcsvdump not generating actual consolidated data? #60

Closed Goro2030 closed 4 years ago

Goro2030 commented 4 years ago

Great initiative with this software, congratulations!

I've just run the following to get the full unspent list of addresses as of yesterday:

rusty-blockparser unspentcsvdump .

Then to test the accuracy of the calculations, I've picked a set of random addresses and looked them up in a blockexplorer, and the balances for the accounts never match.

Am I mixing the "Unspent" definition that you use with the "Address Balance" definition I'm looking for? I want to have a list of all the currently used address that have a balance <> 0 , along with the balance itself. Is that possible?

Example, I got this result as one of the last records in the dump ( again, blockchain as of yesterday ) :

78fee78d871f2af02f3beaa348bae6680c488ec8e937ed99688e77a5a0aaed90;3;364963;1000;1rY7jaoyXUEcQ9scHEK4japsUf6eJ38QP

And that address currently holds 0.20+ BTC ( composed of many transactions UTXO's ) :

This other entry, which actually was the last one of the file, is correct, because it had just one TRANSACTION associated with it: 59caede5124946d243471b0a13661005ea5242d6924539faa73717c6e5154e99;0;277966;100000;1A5RJxRF21EaEauzcdah95Er71W9mzpyJm

And it's current balance is 0.001 BTC. ( )

I've seen other parsers that first do the actual parsing of the transactions UTXO's, and then run a bash command to add-up the transactions by address to get to a result... are you doing something like that ( or should?)

Thanks in advance!

gcarq commented 4 years ago

Hi, thanks for reporting, did you create the dump with master or v0.7.0?

Yes the assumption is correct, the unspentcsvdump should give you a CSV of all addresses with a balance > 0.

I've seen other parsers that first do the actual parsing of the transactions UTXO's, and then run a bash command to add-up the transactions by address to get to a result... are you doing something like that ( or should?)

This is solved with a HashMap, the logic is implemented here.

I will take a look.

It might also be possible that this bug stems from not parsing the tx script properly, right now only a handful of tx types is supported by this parser (Pay2PublicKey, Pay2PublicKeyHash and Pay2ScriptHash), and some times the associated address can't be parsed correctly.

I'm currently working on replacing the custom script parsing with rust-bitcoin which should make it more bullet proof.

gcarq commented 4 years ago

Actually I misread your question, sorry!

Am I mixing the "Unspent" definition that you use with the "Address Balance" definition I'm looking for? I want to have a list of all the currently used address that have a balance <> 0 , along with the balance itself. Is that possible?

In the current state you would still have to manually sum all utxo's together yes. For testing purposes this script should give you the correct amount for the given address:

import sys

balance = 0

with open("unspent-0-639626.csv") as fp:
    for line in fp:
        _, _, _, value, address = line.rstrip().split(";")
        if address == sys.argv[1]:
            balance += int(value)

print(f"total balance: {balance}")
$ python sum.py 1rY7jaoyXUEcQ9scHEK4japsUf6eJ38QP
total balance: 20855000

I can implemented a balance callback, which gives you a CSV in the address ; balance format

Goro2030 commented 4 years ago

Yes, your last reply is exactly what I want, but not for a specific Address, but for addresses that are active with balance > 0 in the blockchain.

You said " I can implemented a balance callback " . Can you do that pretty quickly or is that too complicated?

EDITED: Yes, I'm running the latest version (MASTER)

gcarq commented 4 years ago

I've already implemented it, its not complicated since most of the code is already there. I will push a branch in a few minutes. Would be nice if you can test it. I did a quick test until block height 200000 and it seems ok

Goro2030 commented 4 years ago

Ok! I'll test it and revert back in 6 hours ( the time it takes to do the full parsing on a Ryzen 7 3700x with an NVME Drive and 30GB of RAM allocated to the Ubuntu 20 VM :) )

On Tue, Jul 21, 2020 at 8:33 AM Michael Egger notifications@github.com wrote:

I've already implemented it, its not complicated since most of the code is already there. I will push a branch in a few minutes. Would be nice if you can test it. I did a quick test until block height 200000 and it seems ok

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/60#issuecomment-661827788, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACS7DGSMKAS7NJ7X5HA6MH3R4WDJNANCNFSM4PDHFAZQ .

gcarq commented 4 years ago

Here we go: https://github.com/gcarq/rusty-blockparser/pull/61. The branch is called impl-balances-callback

Goro2030 commented 4 years ago

While I'm running it, let me ask you a question about the Verbosity level parameter... I'm trying to do this:

./rusty-blockparser -v 1 -e 1000 balances .
error: Found argument '1' which wasn't expected, or isn't valid in this context

USAGE:
    rusty-blockparser -v

For more information try --help

Why is that error happening?

gcarq commented 4 years ago

-v takes no value, if you want to increase it just pass it multiple times, e.g.: -vv

Goro2030 commented 4 years ago

Oh, I see, thanks!

I'm trying to get the output of the balance in BTC, and not in satoshis ... I have no Rust experience, so tried this:

self.writer
            .write_all(format!("{};{}\n", "address", "balance/100000000").as_bytes())

But it's not working ... which line should I modify locally and recompile to achieve this?

gcarq commented 4 years ago

That would be:

diff --git a/src/callbacks/balances.rs b/src/callbacks/balances.rs
index 18684fc..7ef75c6 100644
--- a/src/callbacks/balances.rs
+++ b/src/callbacks/balances.rs
@@ -125,8 +125,9 @@ impl Callback for Balances {
         }

         for (address, balance) in balances.iter() {
+            let balance = *balance as f64 / 100000000.0;
             self.writer
-                .write_all(format!("{};{}\n", address, balance).as_bytes())
+                .write_all(format!("{};{:.8}\n", address, balance).as_bytes())
                 .unwrap();
         }
Goro2030 commented 4 years ago

MAN, @gcarq , you're awesome! This is the kind of development support I'd like to see from Microsoft for example ;)

Will report back my findings as soon as this lengthy process finishes.

BTW: Have you gave some thought at using multiple threads to accelerate the execution? Your processing is single-threaded, and I was thinking that if you can instantiate multiple workers, and have each one look at a different block, you'd maximize resource usage (both Disk and CPU )? Of course, it'll make the coding more complex to deal with pipes and inter-process communication to make sure the hashmap is the same, but it'll greatly improve performance.

Goro2030 commented 4 years ago

WIth -v , I'm seeing thouthands of "Invalid UTXO's" errors ... I didn't know the blockchain had so many crappy transactions?

13:21:01] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 886bb78906f25eb27874f35de0790c21ee6997af5653fce5aae0aa62e4b96db7 (NotRecognised)
[13:21:01] DEBUG - unspentcsvdump: Ignoring invalid utxo in: cd038bb4dd6ab27fa1598a57ff2a45613987a79da8a2da257ce05ff8c1dad33d (NotRecognised)
[13:21:01] DEBUG - unspentcsvdump: Ignoring invalid utxo in: c3940f15d4286c8ab31ef8d110fa218467aa3562e0f17ccaafe5a5984c499f23 (NotRecognised)
[13:21:02] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 6c0dbd99e4b21bc19eb7c81fe1d6de9f22dc3afc644c2820bf69b81c55fe8968 (NotRecognised)
[13:21:02] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 9537d35ea2797b95d2ec895b3e9067c2004c1e9fb5ed0fc133c6cdb1586e2961 (NotRecognised)
[13:21:02] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 1cbd40998c28e34ee05b4cbaf4d75a9f28b687629814c6be63bd95d37d7db4c7 (NotRecognised)
[13:21:02] DEBUG - unspentcsvdump: Ignoring invalid utxo in: d8b56aa714e02946dfa6c209106256e76f6da4d3f1bd811b1fd1f69394a7a144 (NotRecognised)
[13:21:02] DEBUG - unspentcsvdump: Ignoring invalid utxo in: c87d84b14a1652e030a1b7448ac9f2ceda32c27c0f265ff7d8bcebb7c4f76faf (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: c456c6f4f7643795360c6da9d0fdf06dc6f89872cf37eadc781e0d811eaea066 (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 8470462b76a8aa35dcada1a01558ee58d27a2f55932b75e1ba8d3a08e953f0e0 (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: a2c612cbd00f65b74d2187516d79b9f8795723d32feec3e67387cb9c8dcd5715 (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 0be518076f6c818c30769e7b6815f4fb61072622523a69a343f6071f06849c04 (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: dc250e42764802dc4b4636a112324f5592da630a6ca00b94ab789e100fcedd28 (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: b1ab1c3437d1ddf062bb06fc2cf6b3eb83088df9af1e8bd7b9a5c0821ce09d4c (NotRecognised)
[13:21:03] DEBUG - unspentcsvdump: Ignoring invalid utxo in: 6195e53a9fca96a5ce448408ec1eb4195b023a2390be335d198a09be269e9052 (NotRecognised)
gcarq commented 4 years ago

Some previous iterations had multi threading, but with that also the memory consumption sky rockets in some cases. Right now I'm thinking about making it async which could help a bit, but I'm not decided yet.

Not all of those transactions are crappy, some of them are just non-standard or just not implemented in this parser yet.

Goro2030 commented 4 years ago

I see. Can't you have all the process use the same hashtable in memory for all of them to "collaborate", vs creating their own in their memory space?

On Tue, Jul 21, 2020 at 9:43 AM Michael Egger notifications@github.com wrote:

Some previous iterations had multi threading, but with that also the memory consumption sky rockets in some cases. Right now I'm thinking about making it async which could help a bit, but I'm not decided yet.

Not all of those transactions are crappy, some of them are just non-standard or just not implemented in this parser yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/60#issuecomment-661869626, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACS7DGRNE6LTZ2VJLFU5OGTR4WLQLANCNFSM4PDHFAZQ .

gcarq commented 4 years ago

For some callbacks this could be possible, but not for all. I think async could be a good middle way, but I'm open for suggestions and pull requests. BTW: I merged the branch, it should be stable

gcarq commented 4 years ago

I've released version 0.8.0 with the balances callback. I'm closing this, feel free to reopen