Cardinal-Cryptography / aleph-node-issues

Issue tracker for aleph-node related problems.
2 stars 0 forks source link

High CPU/memory usage possibly causing dropped blocks #2

Closed Loo399 closed 1 year ago

Loo399 commented 1 year ago

Did you read the documentation and guides?

Is there an existing issue?

Description of the problem

Validator will suddenly stop requiring server/container reboot ?due to surges in CPU/RAM usage.

image

No obvious cause found in logs. (First crash - the first large spike in the CPU utilization graph) logs.txt You can see when the server was rebooted.

Second crash a few hours later. logs_2.txt

Information on your setup.

  1. Running on Mainnet
  2. version 0.10.0+mainnet-1023252c22e
  3. Validator node run using aleph-node-runner
  4. (run with aleph-node-runner)
  5. Ubuntu 22.04.1 LTS

Steps to reproduce

No response

Did you attach relevant logs?

Marcin-Radecki commented 1 year ago

Hi, we're currently investigating provided logs. We'll get back to you as soon as we can.

In the meanwhile could you please provide the hardware specs of the machine you're running aleph-node on?

Loo399 commented 1 year ago

Thank you for your reply. Running c5.xlarge on AWS gp2 1024 GiB 4vCPUs, Ubuntu

Marcin-Radecki commented 1 year ago

Hi, the c5.xlarge is not sufficient anymore for aleph-node version 0.10.0+mainnet-1023252c22e. This version requires at least 8 GB to operate, e.g. c5.2xlarge.

We'll update our hardware requirements to match that. Thanks for reporting and let us know whether this solved the issue.

Loo399 commented 1 year ago

I double checked and c5.xlarge does have 8 GiB of memory. c5.2xlarge starts at 16 GiB. So far haven't had any further issues especially since adding this substrate block watcher script.

Marcin-Radecki commented 1 year ago

Sorry, I mean we need 16 GB now, you're right c5.xlarge has 8 GB. If you run again with the problem of spikes in RAM we suggest upgrading to c5.2xlarge or c6i.2xlarge.

Marcin-Radecki commented 1 year ago

Hi, we've updated our spec reqs page: https://docs.alephzero.org/aleph-zero/validate/hardware-requirements