Closed 88plug closed 1 week ago
@88plug - could you please confirm:
1). Are these nodes being built via Akash Helm Charts? Asking because the Helm Charts sets - minimum_gas_prices: 0.025uakt
- and want to ensure this setting is in place in affected nodes.
2). During node start up - are there any log entries regarding 0 gas prices?
Review from additional node operator impacted by increased P2P traffic:
Nodes using both CLI and Helm Chart builds are experiencing heightened traffic
The CLI node build has minimum_gas_prices: 0.025uakt setting in app.toml
Helm Chart default values were not changed and thus should have minimum_gas_prices: 0.025uakt
The bandwidth is NOT increasing further over time. I.e. the P2P bandwidth rose considerably a few days ago and has been steady at that level since.
No evidence of 0 gas fees in node logs but logs are littered with failed to add vote errors such as:
Jun 21 19:46:38 mainnet-node start-node.sh[32110]: ERR failed to process message err="error adding vote" height=16846721 module=consensus msg_type=*consensus.VoteMessage peer=2a3ba81a7ddb00016af1593f925aed390c4bcca9 round=0
Jun 21 19:46:38 mainnet-node start-node.sh[32110]: INF failed attempting to add vote err="expected 16846720/1/2, but got 16846720/0/2: unexpected step" module=consensus... (175 KB left)
Review from additional node operator impacted by increased P2P traffic:
- Nodes using both CLI and Helm Chart builds are experiencing heightened traffic
- The CLI node build has minimum_gas_prices: 0.025uakt setting in app.toml
- Helm Chart default values were not changed and thus should have minimum_gas_prices: 0.025uakt
- The bandwidth is NOT increasing further over time. I.e. the P2P bandwidth rose considerably a few days ago and has been steady at that level since.
- No evidence of 0 gas fees in node logs but logs are littered with failed to add vote errors such as:
Jun 21 19:46:38 mainnet-node start-node.sh[32110]: ERR failed to process message err="error adding vote" height=16846721 module=consensus msg_type=*consensus.VoteMessage peer=2a3ba81a7ddb00016af1593f925aed390c4bcca9 round=0 Jun 21 19:46:38 mainnet-node start-node.sh[32110]: INF failed attempting to add vote err="expected 16846720/1/2, but got 16846720/0/2: unexpected step" module=consensus... (175 KB left)
It seems that this issue is observed in other networks as well, for example, in the Sentinel Network
https://x.com/zeroservices_eu/status/1784553362316288174
I'm not sure exactly how this problem arises, but it seems that it spreads through specific peers (full node\RPC)
grep "00a39ac3ec012ffa3116a162c17f49df484d0298" .akash/config/config.toml
grep -A 2 -B 2 "00a39ac3ec012ffa3116a162c17f49df484d0298" .akash/config/addrbook.json
I'm not sure why this P2P address appears in the address book 123 times 😳
grep "00a39ac3ec012ffa3116a162c17f49df484d0298" .akash/config/addrbook.json | wc -l
123
@c29r3 can you backup your addrbook and try one from polkachu
Done, but err="error adding vote"
still exists
Here is the traffic for the last 48 hours
I enabled the --log_level debug
mode and saved logs for the last 20 minutes from my RPC node
sudo journalctl -u akash.service --no-hostname --since "20 minutes ago" | grep -v p2p > akash_20min_log.txt
Fixes excessive bandwidth #285
I did a battery of tests over the weekend and was able to resolve the issue.
The issue appears to be with the p2p seed_mode is set to true
for the node in the Helm charts.
Cosmos default is pex true and seed mode false.
I have updated the Helm charts and tested with seed_mode disabled and the excessive bandwidth issue is resolved.
For reference in my testing I also found "error adding vote" will show with 0.025uakt fee. So that may indict some other issue, but it was not related to the bandwidth.
Issue was caused by IBC relayers allowing zero/very low gas TXs onto the network and into mempool. While Akash RPC/validators are universally configured to reject zero gas TXs, a number of IBC relayers were not configured to reject these TXs.
Issue was resolved by:
1). Specific validators intentionally set their min gas requirement to zero to allow these TXs to be written to the chain and thus cleansing the validator mempools of such TXs.
2). Worked with current IBC relayers to ensure they have min gas settings.
Network P2P traffic is now normalized.
@Krewedk0 It's not quite correct.
@troian Deleted my last comment to not give bad people good ideas. But i ran some tests last night and you can actually do very nasty stuff with the setup i mentioned.
Also Chandra Station
and 16psyche
still have 0 min gas prices set on their validator nodes.
Description:
Since upgrading to Akash node release v36.0, nodes have been consuming an unusually high amount of bandwidth, far exceeding the previous usage. This has resulted in "out of bandwidth" notifications across multiple nodes in various datacenters, as well as noticeable lag on residential networks. No changes were made to the default deployment code.
To Reproduce:
Expected Behavior:
The node should have a sustained bandwidth usage of approximately 5,100,000 BPS (5.1 Mbps) for incoming and 6,000,000 BPS (6 Mbps) for outgoing traffic.
Traffic Analysis:
Upon reviewing the provided screenshot:
Attempted Fixes:
I have attempted to limit the P2P connections and adjust the
send_rate
andrecv_rate
parameters in the Cosmos SDK configuration. Despite these efforts, the issue persists.Request:
Please examine the issue deeper and push a fix to stop the irregular bandwidth consumption.
Recommendation:
Anyone running an Akash node should check their bandwidth consumption and traffic to ensure they are not affected by this issue. Create point release for v36.0 that stops excessive bandwidth consumption.
Monthly Traffic View:![image](https://github.com/akash-network/support/assets/19512127/97670030-abd5-4c98-8440-334cebda77b6)
Monthly Traffic View:
Before Upgrade Daily:
After Upgrade Daily:
Additional Context:
This issue is critical as it affects the performance and reliability of the nodes across various datacenters and residential networks. Immediate attention and resolution are required.