ChainSafe / forest

🌲 Rust Filecoin Node Implementation
https://forest.chainsafe.io
Apache License 2.0
638 stars 160 forks source link

snapshot validate: unbound memory #3302

Open LesnyRumcajs opened 1 year ago

LesnyRumcajs commented 1 year ago

Issue summary

The memory needed to validate a mainnet snapshot seems unbound. On a 16 GB RAM machines, it's causing OOM after a minute or so.

According to @lemmih, the culprit is the FVM

Sigh, I think the FVM is taking up 19GiB of RAM. We'll need to address that at some point.

Command: forest-cli snapshot validate <mainnet snapshot> commit: b03ca5d61a18c236fcaa9bfebeae706108e6ed85

Other information and links

aatifsyed commented 1 year ago

This kills one of our killer features :/

hanabi1224 commented 1 year ago

Could it be mitigated by limiting parallelization in validate_tipsets?

lemmih commented 1 year ago

Could it be mitigated by limiting parallelization in validate_tipsets?

Yep, thus killing our killer feature.

lemmih commented 1 year ago

I think the WASM engine settings might be to blame. https://github.com/filecoin-project/ref-fvm/blob/f31c6d3a64278f98270e5a13fc6e8be11e5c534e/fvm/src/engine/mod.rs#L137

    // wasmtime default: OnDemand
    // We want to pre-allocate all permissible memory to support the maximum allowed recursion limit.

Things to investigate:

LesnyRumcajs commented 1 year ago

Isn't wasm32 limited to 4 GB?

lemmih commented 1 year ago

Isn't wasm32 limited to 4 GB?

I think they even lower the limit from 4GiB to 2GiB. But they have a pool of engines, one for each core, each with a 2GiB limit.

    /// Maximum size of memory used during the entire (recursive) message execution. This currently
    /// includes Wasm memories and table elements and will eventually be extended to include IPLD
    /// blocks and actor code.
    ///
    /// DEFAULT: 2GiB
    pub max_memory_bytes: u64,
    // wasmtime default: 4GB
    c.static_memory_maximum_size(instance_memory_maximum_size);
LesnyRumcajs commented 1 year ago

So on my 32 cores it would require 64GB?

lemmih commented 1 year ago

So on my 32 cores it would require 64GB?

As far as I can tell, yes.

sudo-shashank commented 1 year ago
Change Description Network No of Threads Epochs Validated Snapshot Info RSS VSZ
BaseLine Calibnet 1 60 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 739.01 MB 3036361.56 MB
BaseLine Calibnet 2 60 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 762.50 MB 3036432.92 MB
BaseLine Calibnet 4 60 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 846.67 MB 3036739.99 MB
BaseLine Calibnet 8 60 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 866.19 MB 3037035.18 MB
BaseLine Calibnet 1 1999 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 898.11 MB 3036352.50 MB
BaseLine Calibnet 2 1999 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 878.01 MB 3036441.77 MB
BaseLine Calibnet 4 1999 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 900.31 MB 3036771.31 MB
BaseLine Calibnet 8 1999 forest_snapshot_calibnet_2023-08-14_height_822490.forest.car.zst(1.9Gb) 934.74 MB 3037106.39 MB
BaseLine Mainnet 1 60 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4020.62 MB 3048476.51 MB
BaseLine Mainnet 2 60 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4056.17 MB 3048855.77 MB
BaseLine Mainnet 4 60 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4107.39 MB 3048985.30 MB
BaseLine Mainnet 8 60 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4088.47 MB 3048548.39 MB
BaseLine Mainnet 1 120 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4519.46 MB 3049692.50 MB
BaseLine Mainnet 2 120 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4561.81 MB 3049611.76 MB
BaseLine Mainnet 4 120 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 4613.37 MB 3049918.31 MB
BaseLine Mainnet 8 120 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb)
BaseLine Mainnet 8 1500 forest_snapshot_mainnet_2023-08-14_height_3122221.forest.car.zst(57Gb) 14523.47 MB
lemmih commented 1 year ago

@sudo-shashank What are you measuring?

sudo-shashank commented 1 year ago

@sudo-shashank What are you measuring?

trying to measure memory held during the validate run using ps -o rss= -p "$pid" command in a script

lemmih commented 1 year ago

How many epochs are validating and how many threads are you using?

lemmih commented 1 year ago

(As noted in this issue, memory usage depends entirely on how many threads you're using, so that is vital information you must include in your results.)

sudo-shashank commented 1 year ago

How many epochs are validating and how many threads are you using?

60 epochs now, 8 Threads

lemmih commented 1 year ago

How many epochs are validating and how many threads are you using?

60 epochs now, single core

For calibnet, that should only take a few seconds to evaluate. You'll get better data if you benchmark for longer than a few seconds.

lemmih commented 1 year ago

How many epochs are validating and how many threads are you using?

60 epochs now, single core

When you say a single core, do you mean a single thread? Using a single thread to reproduce a problem that only happens when you use a lot of threads isn't wise.

sudo-shashank commented 1 year ago

How many epochs are validating and how many threads are you using?

60 epochs now, single core

When you say a single core, do you mean a single thread? Using a single thread to reproduce a problem that only happens when you use a lot of threads isn't wise.

I checked the config I am using4 cores and I have 16Gb of RAM available, expected peek RSS for forest snapshot validate was 8Gb(4*2Gib ) but I am getting only 4Gib of peek RSS for a mainnet snapshot validation

lemmih commented 1 year ago

How many epochs are validating and how many threads are you using?

60 epochs now, single core

When you say a single core, do you mean a single thread? Using a single thread to reproduce a problem that only happens when you use a lot of threads isn't wise.

I checked the config I am using4 cores and I have 16Gb of RAM available, expected peek RSS for forest snapshot validate was 8Gb(4*2Gib ) but I am getting only 4Gib of peek RSS for a mainnet snapshot validation

The exact amount of memory used is not important. What is important is how the memory usage scales with the number of threads.

sudo-shashank commented 1 year ago

In my observation so far, the memory usage does not scale with no of threads rather it just scales with no of epochs we validate. More epochs more memory utilisation, peeks to max 15Gib for 1999 epochs of a mainnet snapshot for both mainnet and calibnet

lemmih commented 1 year ago

Moving @sudo-shashank to different tasks.

ruseinov commented 1 year ago

I have tried this various times with forest-tool snapshot validate --check-links=0 forest_mainnet.forest.car --check-stateroots=2000

I have noticed that the memory usage depends where we are in the queue.

For example, when I'm at ~1500 stateroots in the queue the memory usage is steady ~12GB and it manages to cleanup extra memory used just fine, but then it seems to start growing again. With 1100 items left in the queue it's about 15GB. So the further down the rabbit hole we go - the more memory is being used.

The auto-detected parallelism is 10 on my machine.

I have tried chunked approach, where MultiEngine is being reinitialised every n items, but that does not seem to have any impact when chunked by 100. It seems like chunks of 20 have a positive impact on memory footprint, but that affects performance more, because we are forced to wait till the current chunk is processed before starting the next one.

ruseinov commented 1 year ago

I have also tried the approach that initialises an engine for each tipset just to see what that does - that slows things down almost to a halt. I'm going to do memory profiling next to see what exactly is eating up the RAM. I'm concerned that the memory does not get cleaned up properly with the chunked approach and reinitialisation.