ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 338 forks source link

Reserve Radius oscillating #4659

Closed ldeffenb closed 1 month ago

ldeffenb commented 2 months ago

Context

2.0.1 mainnet swarm

Summary

When the reserve capacity was exceeded, the storage radius was expected to transition from 10 to 11. Unfortunately, some of my nodes went into an oscillation between radius 10 and 11 and have yet to completely settle down.

Expected behavior

Transition should have gone once and stayed at the new level until a substantial change in the swarm reserve population happens.

Actual behavior

Storage radius bounces up and down even though the swarm chunk population is on a steady increase. From my nodes' logs:

0x300...

"time"="2024-04-30 07:59:01.293708" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 08:58:26.972574" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 08:59:24.221710" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 09:28:26.817659" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 09:40:46.761718" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 09:58:27.009458" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 10:19:49.803257" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 10:58:27.481599" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 13:47:37.214007" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11

0xa00...

"time"="2024-04-30 07:45:19.676353" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 07:49:59.143763" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 07:51:06.566041" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 08:19:59.462574" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 09:07:04.571970" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 09:19:59.401499" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 09:44:08.366476" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 10:19:59.306399" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 10:47:41.125361" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 11:19:58.941507" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 11:21:47.615096" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 11:49:59.538536" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 11:51:52.523748" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 12:19:58.672539" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 12:20:47.682179" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 12:49:59.119967" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 12:50:12.426554" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 13:19:59.024534" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 13:25:25.898105" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 13:49:58.826762" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 13:50:28.169587" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11

0xe7e...

"time"="2024-04-29 18:43:42.118981" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 19:34:11.812235" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 20:31:52.703655" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 20:34:11.920637" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 20:34:34.605685" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 21:04:11.870243" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 21:21:29.511450" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 21:34:11.802812" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 21:34:35.889944" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 22:04:11.979058" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 22:13:15.505857" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 22:34:11.871581" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 22:36:10.964471" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 23:04:11.788607" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 23:04:52.626159" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-29 23:34:11.715272" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-29 23:34:38.393734" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 00:04:11.875542" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 00:12:09.757996" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 01:04:11.601500" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 02:32:40.068996" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 03:04:11.930800" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 03:04:16.244673" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 03:34:11.992070" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 03:34:19.872361" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 04:04:11.783548" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 04:09:52.898739" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 04:34:11.866944" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 04:34:31.829674" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 05:04:11.813139" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10
"time"="2024-04-30 14:35:44.475450" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11
"time"="2024-04-30 15:04:11.822485" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10

Steps to reproduce

Just find the right neighborhood and check your logs for "reserve radius".

Possible solution

No idea at this time, but apparently the triggers for going up and going down are not very stable when at certain levels of reserve population.

Consider the ramifications of an increased reserve radius causing 1/2 of the chunks to be evicted and then the reserve radius going back down. Those evicted chunks need to be recovered from somewhere, but the pullsync intervals are not reset, so the new radius 10 peers are not re-pulled causing the reserve to be incomplete until it transitions again back to 11. As much as I hate the load on the swarm when pullsync intervals are reset, it definitely seems necessary when the reserve radius decreases. But then, please don't fix that until the cause of the radius oscillation is determined and corrected!

ldeffenb commented 2 months ago

I added additional logging to the "reserve radius" logs and see this:

"time"="2024-04-30 17:17:15.653076" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2068699 "threshold"=2097152
"time"="2024-04-30 17:21:01.485788" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=0 "target"=1
"time"="2024-04-30 17:47:15.706153" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2069049 "threshold"=2097152
"time"="2024-04-30 17:54:05.898721" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=3 "target"=38

Now I think I need to get and log the current "reserve in radius" or whatever is triggering the failed eviction pass that results in the radius going back to 11. And possibly what eviction actually caused it to drop back to 10, although I might actually have that log in place already.

ldeffenb commented 2 months ago

A more complete picture of the reserve radius oscillation with unreserve activity interspersed:

"time"="2024-04-30 16:42:47.858780" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:42:47.910740" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=25 "target"=1 "radius"=11
"time"="2024-04-30 16:46:09.984192" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:46:09.995303" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=1 "target"=1 "radius"=11
"time"="2024-04-30 16:46:13.175057" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:46:13.205142" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=14 "target"=1 "radius"=11
"time"="2024-04-30 16:46:59.474077" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:46:59.487408" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=11
"time"="2024-04-30 16:47:02.481340" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:47:02.532747" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=37 "target"=1 "radius"=11
"time"="2024-04-30 16:49:01.351044" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:49:01.376584" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=36 "target"=1 "radius"=11
"time"="2024-04-30 16:52:56.032044" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:52:56.082562" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=70 "target"=1 "radius"=11
"time"="2024-04-30 16:58:50.480759" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:58:50.492510" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=11 "target"=1 "radius"=11
"time"="2024-04-30 16:59:36.378887" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 16:59:36.423424" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=31 "target"=1 "radius"=11
"time"="2024-04-30 17:02:07.063475" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:02:07.098790" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=15 "target"=1 "radius"=11
"time"="2024-04-30 17:03:04.458098" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:03:04.477709" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=11
"time"="2024-04-30 17:03:20.046137" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:03:20.103290" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=86 "target"=1 "radius"=11
"time"="2024-04-30 17:11:11.190075" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:11:11.241392" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=40 "target"=1 "radius"=11
"time"="2024-04-30 17:14:41.631058" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:14:41.656774" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=11
"time"="2024-04-30 17:14:51.276002" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:14:51.332756" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=60 "target"=1 "radius"=11
"time"="2024-04-30 17:17:15.653076" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2068699 "threshold"=2097152
"time"="2024-04-30 17:18:51.521460" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:18:51.576204" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=10
"time"="2024-04-30 17:19:13.995353" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:14.087825" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=10
"time"="2024-04-30 17:19:27.424530" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:27.501146" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=1 "target"=1 "radius"=10
"time"="2024-04-30 17:19:31.909752" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:31.956507" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=10 "target"=1 "radius"=10
"time"="2024-04-30 17:20:01.548064" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:01.616251" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=2 "target"=1 "radius"=10
"time"="2024-04-30 17:20:05.106759" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:05.132815" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=1 "target"=1 "radius"=10
"time"="2024-04-30 17:20:12.897006" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:12.975257" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=9 "target"=1 "radius"=10
"time"="2024-04-30 17:21:01.363323" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:21:01.485788" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=0 "target"=1
"time"="2024-04-30 17:21:01.525302" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=92 "target"=1 "radius"=11
"time"="2024-04-30 17:29:16.020711" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:29:16.062859" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=95 "target"=1 "radius"=11
"time"="2024-04-30 17:36:04.329001" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:36:04.465199" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=451 "target"=1 "radius"=11
"time"="2024-04-30 17:47:15.706153" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2069049 "threshold"=2097152
"time"="2024-04-30 17:54:05.732476" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:54:05.775163" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=10
"time"="2024-04-30 17:54:05.775170" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=38 "radius"=10
"time"="2024-04-30 17:54:05.898721" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=3 "target"=38
"time"="2024-04-30 17:54:05.925511" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=71 "target"=38 "radius"=11
"time"="2024-04-30 17:57:52.851208" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:57:52.889357" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=26 "target"=1 "radius"=11
"time"="2024-04-30 17:59:54.845527" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:59:54.861630" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=22 "target"=1 "radius"=11
"time"="2024-04-30 18:02:56.494564" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:03:50.044736" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=207037 "target"=1 "radius"=11
ldeffenb commented 2 months ago

And adding another log to the mix shows something even more strange. First note that the "new_size" definitely seems to be tracking the total reserve size, not the size within radius. Then also notice the transition from unreserving bin 10 to 11 at the gap in the following logs. Note: these logs are coming from my extra-logging fork at https://github.com/ldeffenb/bee/tree/0292db5a768e5690d92184d6440e5568d81a4005

"time"="2024-04-30 17:17:15.653076" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2068699 "threshold"=2097152
"time"="2024-04-30 17:18:51.521460" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:18:51.576180" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=6 "batchID"="2f3a951f7d37029a16d00322a1754eb12ea7898858df14b660a6740bda1b9168" "new_size"=4194299
"time"="2024-04-30 17:18:51.576204" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=10
"time"="2024-04-30 17:19:13.995353" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:14.087808" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=3 "batchID"="5332380e0ebf63d45976c08731988acf3061cff55e9869485c777cebffb8d599" "new_size"=4194302
"time"="2024-04-30 17:19:14.087825" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=10
"time"="2024-04-30 17:19:27.424530" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:27.501129" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=1 "batchID"="5332380e0ebf63d45976c08731988acf3061cff55e9869485c777cebffb8d599" "new_size"=4194304
"time"="2024-04-30 17:19:27.501146" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=1 "target"=1 "radius"=10
"time"="2024-04-30 17:19:31.909752" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:19:31.956499" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=10 "batchID"="634e99cfc03c2e9e9a7941213e91af2d0d0ec2e2a22854141f561d76ef300ba7" "new_size"=4194295
"time"="2024-04-30 17:19:31.956507" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=10 "target"=1 "radius"=10
"time"="2024-04-30 17:20:01.548064" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:01.616238" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=2 "batchID"="9af1b37f38d4af75dbee9ba713b95bcc6d03e1c759ac774ec8bd4864e68d2c03" "new_size"=4194303
"time"="2024-04-30 17:20:01.616251" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=2 "target"=1 "radius"=10
"time"="2024-04-30 17:20:05.106759" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:05.132807" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=1 "batchID"="2f3a951f7d37029a16d00322a1754eb12ea7898858df14b660a6740bda1b9168" "new_size"=4194304
"time"="2024-04-30 17:20:05.132815" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=1 "target"=1 "radius"=10
"time"="2024-04-30 17:20:12.897006" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:20:12.975248" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=9 "batchID"="bc674e5e1114f88a957a82768436b6edfd960781fc95f2832e6c6f1e9c0b56fd" "new_size"=4194296
"time"="2024-04-30 17:20:12.975257" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=9 "target"=1 "radius"=10

Why the jump from 10 to 11?

"time"="2024-04-30 17:21:01.363323" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:21:01.525290" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=92 "batchID"="0bb68eeb6c1b5d0d36d6fd10f9f77e121bfc2dce065461d0c8c49f03cb765be6" "new_size"=4194213
"time"="2024-04-30 17:21:01.525302" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=92 "target"=1 "radius"=11

"time"="2024-04-30 17:29:16.020711" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:29:16.062849" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=95 "batchID"="0c0eacbb92f25cd4d23e487d142051f02a649f2e234fb7d820bd25515ff6857d" "new_size"=4194210
"time"="2024-04-30 17:29:16.062859" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=95 "target"=1 "radius"=11
"time"="2024-04-30 17:36:04.329001" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:36:04.465185" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=451 "batchID"="0c41d61f798c6ced7a4b7ccbd35ea11b6542cfc4adde40027a49a86ba586ea2b" "new_size"=4193854
"time"="2024-04-30 17:36:04.465199" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=451 "target"=1 "radius"=11
"time"="2024-04-30 17:47:15.706153" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2069049 "threshold"=2097152
"time"="2024-04-30 17:54:05.732476" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10
"time"="2024-04-30 17:54:05.775152" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=3 "batchID"="2f3a951f7d37029a16d00322a1754eb12ea7898858df14b660a6740bda1b9168" "new_size"=4194342
"time"="2024-04-30 17:54:05.775163" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=10
"time"="2024-04-30 17:54:05.775170" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=38 "radius"=10
"time"="2024-04-30 17:54:05.821675" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=1 "batchID"="634e99cfc03c2e9e9a7941213e91af2d0d0ec2e2a22854141f561d76ef300ba7" "new_size"=4194341
"time"="2024-04-30 17:54:05.843779" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=10 "evicted"=2 "batchID"="9af1b37f38d4af75dbee9ba713b95bcc6d03e1c759ac774ec8bd4864e68d2c03" "new_size"=4194339
"time"="2024-04-30 17:54:05.898721" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=3 "target"=38
"time"="2024-04-30 17:54:05.906004" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=17 "batchID"="0c41d61f798c6ced7a4b7ccbd35ea11b6542cfc4adde40027a49a86ba586ea2b" "new_size"=4194322
"time"="2024-04-30 17:54:05.910578" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=10 "batchID"="0c4bcafde5be94b877833f46607e969b66cd8372a75960e3be6f128e9721d5e5" "new_size"=4194312
"time"="2024-04-30 17:54:05.925501" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=41 "batchID"="0d49a0f2b594749539fadd99514a5251d72ee48280bf348550490a581a2d2f23" "new_size"=4194271
"time"="2024-04-30 17:54:05.925511" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=71 "target"=38 "radius"=11
"time"="2024-04-30 17:57:52.851208" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:57:52.889334" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=26 "batchID"="0e1dab3d4c26128ee72b5d9341fc7885ecfcec768d2c44ff34afcd07f278ab82" "new_size"=4194279
"time"="2024-04-30 17:57:52.889357" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=26 "target"=1 "radius"=11
"time"="2024-04-30 17:59:54.845527" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:59:54.861619" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=22 "batchID"="0e2ddc48d7e3719d1bf14f16bbe83c81f4c0ac1bb5a10ea8cddcfce411886988" "new_size"=4194283
"time"="2024-04-30 17:59:54.861630" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=22 "target"=1 "radius"=11

And this apparently large batch eviction ended up triggering the decrement of the reserve radius to follow.   Wanted to evict just 1 chunk, but ended up evicting 207,037.   Granted, not enough in itself to trigger a radius change, but that could also be a possibility with a very large batch that somehow dropped 2,000,000 chunks into a single neighborhood.

"time"="2024-04-30 18:02:56.494564" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:03:50.044727" "level"="debug" "logger"="node/storer" "msg"="reserve eviction" "uptoBin"=11 "evicted"=207037 "batchID"="0e8366a6fdac185b6f0327dc89af99e67d9d3b3f2af22432542dc5971065c1df" "new_size"=3987274
"time"="2024-04-30 18:03:50.044736" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=207037 "target"=1 "radius"=11

"time"="2024-04-30 18:17:15.332073" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2069371 "threshold"=2097152
ldeffenb commented 2 months ago

And yet another iteration with additional logs provided in https://github.com/ldeffenb/bee/tree/d5c2646808030e52f4578c369b5862e8739686b8

It seems like there should be one and only one bit of code controlling reserve radius increase AND decrease and the former should be triggered (maybe) when the total reserve size goes over the capacity. What seems to be happening is that the "in radius" checker sees the level 11 is under-populated so it lowers it to 10. But then the evictor fails to make the target (don't ask me why?) so it raises the radius back to 11. But in fact, the total reserve doesn't have enough chunks in radius 11 to stay there, so it drops to 10, but then eviction (which maybe doesn't actually have the proper batch/bin records to evict properly?) fails to make the target, so it raises it back to 11. And loop forever.

Maybe the reserve should be ALLOWED to go over-capacity UNTIL an "in radius" actual count can be done. And maybe there needs to be a routine that evicts EVERYTHING that is OUTSIDE the radius rather than trying to go one batch/bin at a time? That way it's only the reserve iterator that does it all and doesn't rely on a (potentially incomplete?) batch/bin index to identify which chunks should be evicted?

Drop to 10...

"time"="2024-04-30 17:35:53.928104" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2094402 "threshold"=2097152 "size"=4193500
"time"="2024-04-30 17:38:58.669601" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10

and the unreserve doesn't make the target, so we increase to 11

"time"="2024-04-30 17:38:58.831092" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=0 "target"=1 "size"=4194464
"time"="2024-04-30 17:38:58.832474" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=2 "target"=1 "radius"=11 "size"=4194462
"time"="2024-04-30 17:38:58.832513" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=158 "radius"=11
"time"="2024-04-30 17:38:58.917816" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=366 "target"=158 "radius"=11 "size"=4194096
"time"="2024-04-30 17:56:56.386541" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:56:56.496861" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=31 "target"=1 "radius"=11 "size"=4194274
"time"="2024-04-30 17:59:26.417036" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 17:59:26.516763" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=12 "target"=1 "radius"=11 "size"=4194293
"time"="2024-04-30 18:00:07.307659" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:00:07.324519" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=11 "size"=4194299
"time"="2024-04-30 18:00:15.061460" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:00:15.104545" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=11 "size"=4194299
"time"="2024-04-30 18:00:31.969674" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:00:32.010658" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=11 "size"=4194299
"time"="2024-04-30 18:00:41.668824" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:00:41.706487" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=25 "target"=1 "radius"=11 "size"=4194280
"time"="2024-04-30 18:02:18.512184" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:02:18.607418" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=19 "target"=1 "radius"=11 "size"=4194286
"time"="2024-04-30 18:04:10.068743" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:04:10.151247" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=19 "target"=1 "radius"=11 "size"=4194286
"time"="2024-04-30 18:05:40.662497" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:05:40.726525" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=23 "target"=1 "radius"=11 "size"=4194282

But then the "in radius" decides we don't have enough so we go back down to 10...

"time"="2024-04-30 18:05:54.051621" "level"="info" "logger"="node/storer" "msg"="reserve radius decrease" "radius"=10 "count"=2094768 "threshold"=2097152 "size"=4194287
"time"="2024-04-30 18:06:09.555456" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=10

And then the next unreserve fails (why?) so we go back up to 11...

"time"="2024-04-30 18:06:09.700775" "level"="info" "logger"="node/storer" "msg"="reserve radius increase" "radius"=11 "evicted"=0 "target"=1 "size"=4194334
"time"="2024-04-30 18:06:09.712628" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=3 "target"=1 "radius"=11 "size"=4194331
"time"="2024-04-30 18:06:09.712656" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=27 "radius"=11
"time"="2024-04-30 18:06:09.739441" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=30 "target"=27 "radius"=11 "size"=4194301
"time"="2024-04-30 18:06:10.039841" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:06:10.062060" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=12 "target"=1 "radius"=11 "size"=4194304
"time"="2024-04-30 18:06:10.062941" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:06:10.107416" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=52 "target"=1 "radius"=11 "size"=4194255
"time"="2024-04-30 18:06:10.542226" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:06:10.560749" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=11 "target"=1 "radius"=11 "size"=4194300
"time"="2024-04-30 18:06:10.564554" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:06:10.573865" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=4 "target"=1 "radius"=11 "size"=4194309
"time"="2024-04-30 18:06:10.573887" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=5 "radius"=11
"time"="2024-04-30 18:06:10.587181" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=9 "target"=5 "radius"=11 "size"=4194314
"time"="2024-04-30 18:06:10.587211" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=10 "radius"=11
"time"="2024-04-30 18:06:10.600918" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=13 "target"=10 "radius"=11 "size"=4194311
"time"="2024-04-30 18:06:10.600946" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=7 "radius"=11
"time"="2024-04-30 18:06:10.613411" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=8 "target"=7 "radius"=11 "size"=4194306
"time"="2024-04-30 18:06:10.613444" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=2 "radius"=11
"time"="2024-04-30 18:06:10.628527" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=11 "target"=2 "radius"=11 "size"=4194307
"time"="2024-04-30 18:06:10.628553" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=3 "radius"=11
"time"="2024-04-30 18:06:10.647768" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=14 "target"=3 "radius"=11 "size"=4194309
"time"="2024-04-30 18:06:10.647798" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=5 "radius"=11
"time"="2024-04-30 18:06:10.664820" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=5 "radius"=11 "size"=4194317
"time"="2024-04-30 18:06:10.664846" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=13 "radius"=11
"time"="2024-04-30 18:06:10.731846" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=44 "target"=13 "radius"=11 "size"=4194326
"time"="2024-04-30 18:06:10.731870" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=22 "radius"=11
"time"="2024-04-30 18:06:10.822262" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=31 "target"=22 "radius"=11 "size"=4194339
"time"="2024-04-30 18:06:10.822297" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=35 "radius"=11
"time"="2024-04-30 18:06:10.883174" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=49 "target"=35 "radius"=11 "size"=4194332
"time"="2024-04-30 18:06:10.883207" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=28 "radius"=11
"time"="2024-04-30 18:06:10.927568" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=30 "target"=28 "radius"=11 "size"=4194304
"time"="2024-04-30 18:06:19.749084" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:06:19.799066" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=7 "target"=1 "radius"=11 "size"=4194298
"time"="2024-04-30 18:07:07.214476" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:07:07.272042" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=2 "target"=1 "radius"=11 "size"=4194303
"time"="2024-04-30 18:07:11.888326" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:07:11.922295" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=9 "target"=1 "radius"=11 "size"=4194296
"time"="2024-04-30 18:08:00.044641" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:08:00.114654" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=6 "target"=1 "radius"=11 "size"=4194299
"time"="2024-04-30 18:08:33.836729" "level"="info" "logger"="node/storer" "msg"="unreserve start" "target"=1 "radius"=11
"time"="2024-04-30 18:08:33.950028" "level"="info" "logger"="node/storer" "msg"="unreserve finished" "evicted"=56 "target"=1 "radius"=11 "size"=4194249
ldeffenb commented 1 month ago

Since the batch evictor has been modified to only remove the required number of chunks rather than the entire batch, the oscillation seems to have been fixed. At least, the radius changes in the sepolia testnet showed no oscillations in 2.1.0-rc2.