AstarNetwork / Astar

The dApp hub for blockchains of the future
https://astar.network/
GNU General Public License v3.0
739 stars 349 forks source link

Warp sync triggers OOM on Astar #1110

Open bLd75 opened 9 months ago

bLd75 commented 9 months ago

Description

Warp sync in not operational on Astar in latest versions, after downloading state (5.3+ Gb), import state triggers OOM on the server

Steps to Reproduce

Start Astar node sync with --sync warp option

Environment

Quite similar to this issue but on para side. Issue will be solved after uplifting to Polkadot v1.0.0

ashutoshvarma commented 4 months ago

It is resolved right?

bLd75 commented 3 months ago

@ashutoshvarma we have OOM case currently ongoing on Astar latest version. @paradox-tt will provide details here.

Dinonard commented 3 months ago

We're still uplifting & catching up to the latest version.

But please do provide command, environment & logs if you have them.

paradox-tt commented 3 months ago

Hey team,

Here's my flags with public address hidden

ExecStart=/usr/local/bin/astar-collator \
  --validator \
  --rpc-cors all \
  --name Dox-Astar-01 \
  --execution wasm \
  --state-cache-size 1 \
  --chain astar \
  --public-addr=/ip4/x.x.x.x/tcp/30330 \
  --listen-addr=/ip4/172.19.12.15/tcp/30330 \
  --bootnodes /ip4/20.93.150.146/tcp/30330/p2p/12D3KooWKZwcaofXPmXWHSSfnh34VFJ8zSRJScnNu9UA75x8kNXi \
  --allow-private-ipv4 \
  --discover-local \
  --rpc-port=9110 \
  --prometheus-external \
  --prometheus-port=9702 \
  --rpc-methods=Unsafe \
#  --sync=warp \
  --blocks-pruning=1000 \
  --state-pruning=1000 \
  --telemetry-url 'wss://telemetry-backend.w3f.community/submit/ 1' \
  --telemetry-url 'wss://telemetry.polkadot.io/submit/ 1' \
#  --relay-chain-rpc-urls "wss://rpc.ibp.network/polkadot" \

There's no error in the logs, except that warping continues until the server's out of memory or the instance reboots

Jun 12 11:23:56 doxastar astar-collator[52243]: 2024-06-12 11:23:56 [Parachain] ⏩ Warping, Downloading state, 406.43 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 0.7kiB/s ⬆ 0.4kiB/s
Jun 12 11:24:00 doxastar astar-collator[52243]: 2024-06-12 11:24:00 [Relaychain] ✨ Imported #21184086 (0x5200…14a5)
Jun 12 11:24:01 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Relaychain] 💤 Idle (15 peers), best: #21184086 (0x5200…14a5), finalized #21184083 (0xb341…3109), ⬇ 145.2kiB/s ⬆ 192.6kiB/s
Jun 12 11:24:02 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Parachain] ⏩ Warping, Downloading state, 409.57 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 272.5kiB/s ⬆ 0.9kiB/s
Jun 12 11:24:06 doxastar astar-collator[52243]: 2024-06-12 11:24:06 [Relaychain] ✨ Imported #21184087 (0x6345…6470)
-- Boot 5e8c89c8388a471daea298612802f1e0 --
Jun 12 11:27:01 doxastar systemd[1]: Started Astar Node.
Jun 12 11:27:01 doxastar astar-collator[738]: `--state-cache-size` was deprecated. Please switch to `--trie-cache-size`.
Jun 12 11:27:01 doxastar astar-collator[738]: CLI parameter `--execution` has no effect anymore and will be removed in the future!
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 Astar Collator
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ✌️  version 5.39.1-111d18fbfba
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ❤️  by Stake Technologies <devops@stake.co.jp>, 2019-2024
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 📋 Chain specification: Astar
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 🏷  Node name: Dox-Astar-01
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 👤 Role: AUTHORITY
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 💾 Database: RocksDb at /home/astar_1/.local/share/astar-collator/ch
bLd75 commented 2 months ago

Update on pre v5.42.0 client test: the issue is still the same. In my tests with 32GB RAM, the node always gets OOM at the same time: importing state at 5762.42 Mib. Once it arrives at this state size, suddenly memory gets filled and bursts to 100% in less than 2 minutes. image

bLd75 commented 2 months ago

I can't see any significant correlation do disk usage, meaning the problem is targetted on RAM usage by warp sync. image

bLd75 commented 2 months ago

More insights on memory on short time frame image image image image image image image image image