Cardinal-Cryptography / aleph-node-issues

Issue tracker for aleph-node related problems.
2 stars 0 forks source link

What triggers high-bandwidth mode? #12

Closed eifos-git closed 8 months ago

eifos-git commented 9 months ago

Did you read the documentation and guides?

Is there an existing issue?

Description of the problem

Aleph-node seems to have two modes of operation. Let's call them normal and high-bandwidth.

Normal mode uses 5Mbs of traffic (1.5TB per month) High bandwidth uses 30Mbs (9 TB per month) In both modes the node operates normally

The switch between normal and high-bandwidth is instant. Once a node enters high-bandwidth it can never go back to normal until the service is restarted.

Here is he graph for the last few days. image

Every drawn black vertical line represents a manual service restart. This resets the node back to normal mode. What triggers a node to enter high-bandwidth and is there a way to avoid this?

Caveat: this only seems to happen on mainnet. Testnet nodes are unaffected.

Regards, Sofie

Information on your setup.

Mainnet
Version 0.11.4
How do you run aleph-node - Directly
Is it a validator node or RPC-node? - Validator
what flags do you run aleph-node with? - Default'ish
operating system - Ubuntu 22.04.3 LTS
hardware - AMD 16/32 - 128GB - Nvme

Steps to reproduce

No response

Did you attach relevant logs?

obrok commented 8 months ago

Hi, thanks for your report. This does not match any known issue, so the team will investigate and we'll get back to you once we understand more.

woocash2 commented 8 months ago

@eifos-git are you able to specify on which ports you have noticed the increased throughput, so we can determine which components are affected? Also, attaching any logs would always be helpful.

eifos-git commented 8 months ago

@woocash2 l'll post this info the next time the node jumps to high bandwidth. Obviously now that it knows that I'm watching it, it refuses to to so 😅

Graph for the month: image

eifos-git commented 8 months ago

@woocash2 @obrok The jump happened 40 minutes ago. It took around 10 minutes to reach its peak from which it will now not return until the service is restarted.

Graph for the last two hours image

This is iftop from before the jump. Each individual connection is only a few kbs. iftop-normalmode - redacted

And afterwards. Individual connections now peak at 700kbs+ iftop-highbandwidth - redacted

alephnode.log for the last few hours

woocash2 commented 8 months ago

Thank you, this will surely help us with recognizing the issue. :ok_hand:

eifos-git commented 8 months ago

Last week (red line) the node was upgraded to 12.2 Ever since the connection has been stable and the jump to very high bandwidths hasn't happened. Will keep monitoring for a while but so far it looks good.

image

woocash2 commented 8 months ago

@eifos-git thanks for the update. Last week we were able to reproduce a rapid throughput increase on one of our mainnet nodes (running version 11), but we couldn't find the cause of this behaviour. However, your most recent report, your previous description of testnet behaviour, and the most recent throughput readings from our mainnet nodes (running 12.2) suggest that the issue is no longer present in the new 12.2 version.

Thank you for raising the issue and please let us know if something like this appears again. I will allow myself to close the issue on the 5th of February if there will be no more reports on that.