gcarq / rusty-blockparser

Bitcoin Blockchain Parser written in Rust language
GNU General Public License v3.0
370 stars 145 forks source link

RAM utilization / tweaking for performance #18

Closed mikewerwin closed 4 years ago

mikewerwin commented 7 years ago

Howdy,

In reading the README, I gathered that the RAM footprint would easily fit within my VMs resources (12 proc, 2TB storage, 48GB RAM), but I found that in running the example 'blockparser -t 3 csvdump' the system got to the brink and the kernel ended up reaping the process before it panic'ed. I've not investigated the code yet, but it was my understanding that most of the processing (after the chain.json is loaded in RAM) is done via disk in/outs.

Oh, one thing that I may have polluted things with is that I used 8 thread workers (-t 8).

Thanks for any quick guidance on where I should look!

Best, -ME

gcarq commented 7 years ago

Hi,

how did you build and execute the binary?

cheers

mikewerwin commented 7 years ago

I built it from source by following your README instructions—

Here’s system specs, if that will help:

CentOS Linux release 7.3.1611 (Core) cargo 0.20.0 rustc 1.19.0

I also used your example llvm wrapper, which, now that I think about it, might be the source of my problem.

Thoughts?

On Sep 13, 2017, at 3:24 AM, Michael Egger notifications@github.com wrote:

Hi,

how did you build and execute the binary?

cheers

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/18#issuecomment-329095461, or mute the thread https://github.com/notifications/unsubscribe-auth/AVhfQ9hMD2K6aBjMWiU2eail2RoNUQxKks5sh5FFgaJpZM4PVOyX.

gcarq commented 7 years ago

Yes I think the wrapper might be the root cause. Can you test it w/o the wrapper?. Also verify that you are running the release build and not the debug one. (e.g.: cargo build --release or cargo run --release.

Where is your working directory? Are you using tmpfs?

mikewerwin commented 7 years ago

Thanks for the note!

Yep, it was built w/ —release

I just did a fresh rebuild (without the wrapper): [me@gallifrey rusty-blockparser]# cargo build --release Compiling bitflags v0.7.0 ... Compiling rust-base58 v0.0.4 Compiling rusty-blockparser v0.6.0 (file:///usr/local/src/rusty-blockparser file:///usr/local/src/rusty-blockparser) Finished release [optimized] target(s) in 100.58 secs

I’m building in: /usr/local/src/rusty-blockparser/ and executing in: /usr/local/src/rusty-blockparser/target/release/

(no tmpfs involvement for either)

./rusty-blockparser -t 3 csvdump /home/data/ (I went back to 3 worker threads, too)

The /home/data directory is a large fast volume, but not ram-based.

I’m executing in a screen and so far it does look happier (292k blocks left) and only 2.1g of RAM used….
…. [14:49:38] INFO - dispatch: Status: 192292 Blocks processed. (left: 292743, avg: 266 blocks/sec)

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15356 root 20 0 2495312 2.146g 2464 R 395.0 4.5 54:11.48 rusty-blockpars
5374 bitcoind 20 0 2356548 348508 13948 S 1.0 0.7 31:53.17 bitcoind

I bet we figured it out. I’ll report back if for some reason it goes south. Further, I’ll do some higher verbosity and tracing to isolate, if it mis-behaves.

Thanks again, -ME

On Sep 13, 2017, at 8:05 AM, Michael Egger <notifications@github.com mailto:notifications@github.com> wrote:

Yes I think the wrapper might be the root cause. Can you test it w/o the wrapper?. Also verify that you are running the release build and not the debug one. (e.g.: cargo build --release or cargo run --release.

Where is your working directory? Are you using tmpfs?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/18#issuecomment-329161384, or mute the thread https://github.com/notifications/unsubscribe-auth/AVhfQ80oesAewUIC7TTLDwHU3pzVLfmuks5sh9M1gaJpZM4PVOyX.

gcarq commented 7 years ago

I'm glad to hear that it works. PS: There is no segwit support yet (see https://github.com/gcarq/rusty-blockparser/issues/13 for a workaround). So if your blockchain is up-to-date the parser will throw an exception at some point.

mikewerwin commented 7 years ago

Good reminder — thanks! I knew it would start to barf because of the segwit blocks.

PS> Nice code! I took a brief tour while it’s been running.

On Sep 13, 2017, at 10:12 AM, Michael Egger <notifications@github.com mailto:notifications@github.com> wrote:

I'm glad to hear that it works. PS: There is no segwit support yet (see #13 https://github.com/gcarq/rusty-blockparser/issues/13 for a workaround). So if your blockchain is up-to-date the parser will throw an exception at some point.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/18#issuecomment-329199543, or mute the thread https://github.com/notifications/unsubscribe-auth/AVhfQ8uvUPocwxAtiORCWMp3y3oJD0Itks5sh_DPgaJpZM4PVOyX.

gcarq commented 7 years ago

Thanks! Did it work as expected this time?

mikewerwin commented 7 years ago

Well… I’m actually having a little bit of an odd time with it. After concluding that it was working, I went back to do this:

./rusty-blockparser -t 8 unspentcsvdump /home/data/

What happened was that the system began to consume RAM again until it was at 90% (~45GB) in use. I was watching as it ran and noticed that it very slightly increased as it worked through the blocks (~2 hours total). I ended up ^C’ing out of it when it had processed around 375k blocks (to head off the eventual kernel reap).

A few thoughts:

  1. My use of a higher thread pool (-t 8) may cause the main management process to fail to handle something when juggling.
  2. There’s a micro-leak that gets exacerbated when high threads are in use.
  3. It’s doing what it’s designed to do, and the main event loop uses RAM to buffer results as it walks the chain, flushing at the very end.

One other thought: I did notice that the output results were NOT being written to the file as it was processing, which led me to #3 in my above list. I did notice results along the way in my first execution, so that was inconsistent.

I’m not familiar with rust, but I’m about to dive into the code and put it through a debugger to see where things are crunching.

If you have some thoughts, I’d love to hear them, especially around the conceptual design of the code (what it’s supposed to do).

Cheers! -ME

"Consider - One: Probability is a factor which operates within natural forces. Two: Probability is not operating as a factor. Three: We are now held within un-, sub-, or supernatural forces."
Mike W. Erwin <mikee@caffeine.net mailto:mikee@borrowedtime.com>

On Sep 14, 2017, at 9:22 AM, Michael Egger notifications@github.com wrote:

Thanks! Did it work as expected this time?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/18#issuecomment-329497819, or mute the thread https://github.com/notifications/unsubscribe-auth/AVhfQ3L3qc6s-oZ1puiPQCO9wW22vgQYks5siTaOgaJpZM4PVOyX.

gcarq commented 7 years ago

Your third assumption is correct. Compared to csvdump the current unspentcsvdump implementation is very expensive, because the callback keeps track of all transactions and removes them from a internal hashmap when the tx_output is spent (this HashMap is never written to file).

I don't know if the unspentcsvump callback could be rewritten to handle the current amount of transactions (I have to do the math first).

I think it would be better to use csvump and do the rest in an post procession step.

cheers, Michael

mikewerwin commented 7 years ago

Ah, that makes sense. Thanks for the guidance — My first strategy was the post-processing anyhow, so I have in motion already.

Thanks again! ME

On Sep 15, 2017, at 7:28 AM, Michael Egger notifications@github.com wrote:

Your third assumption is correct. Compared to csvdump the current unspentcsvdump implementation is very expensive, because the callback keeps track of all transactions and removes them from a internal hashmap https://github.com/gcarq/rusty-blockparser/blob/master/src/callbacks/unspentcsvdump.rs#L23 when the tx_output is spent (this HashMap is never written to file).

I don't know if the unspentcsvump callback could be rewritten to handle the current amount of transactions.

I think it would be better to use csvump and do the rest in an post procession step.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gcarq/rusty-blockparser/issues/18#issuecomment-329768513, or mute the thread https://github.com/notifications/unsubscribe-auth/AVhfQ_QqcFcYdMwDqV0wmjAUTVRqPQ2rks5sim2CgaJpZM4PVOyX.

gcarq commented 4 years ago

I'm closing this. A lot has been changed with the version 0.7.0. The memory consumption should be a lot lower now. Feel free to reopen