ethstorage / es-node

Golang implementation of the EthStorage node.
Other
1.15k stars 77 forks source link

fix Issue232: Use dynamically adjusted request size to fetch blobs data from peers through p2p instread of using the static p2p.max.request.size value #287

Closed ping-ke closed 5 months ago

ping-ke commented 6 months ago

Issue: https://github.com/ethstorage/es-node/issues/232

Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.

  1. Change the p2p.max.request.size to p2p.request.size which will be used to initialize the request size from a new peer.
  2. Add a tracker for each peer to adjust request size according to network conditions between the peer and local node. capacity = 0.9(t.capacity) + 0.1 return blobs/second (Return blob count/Return time in second)
  3. When selecting an idle peer to send a new request, order idle peers by their capacities, and select the peer with the biggest capacity.

This design refers to rates design in "github.com\ethereum\go-ethereum\eth\protocols\snap\sync.go" as the following: 1716396243340

A similar design also exists in prysm, it uses both capacity and score (processedBatches * 0.1) to sort, and also adds some randomness to select that peer or not. 1716685393539 1716685393555 1716685393570 1716691956365 1716691956385

How To Test Change to the log level to debug, or change the following code in the tracker.go to info. Then run the es-node from the begining. log.Debug("Update tracker", "peer id", t.peerID, "elapsed", elapsed, "items", items, "old capacity", oldcap, "capacity", t.capacity)

then check the log like the following: t=2024-05-25T11:46:19+0000 lvl=info msg="Update tracker" "peer id"=16Uiu2HAmGAyykt2njnJYTSU9KsiFutrQKZA1w8LhS45ERpqxfwFV elapsed=333.787809ms items=8,388,608 "old capacity"=39724207.481 capacity=38264942.629

The following is the test result for this feature between local node and peer on AX101. The inital request size is 8M, and after 3 minutes, the request size stable between 4.5M ~ 5.2M 80272674f2478a3e226cc65f3c41536

qzhodl commented 5 months ago

Issue: #232

Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.

  1. Change the p2p.max.request.size to p2p.request.size which will be used to initialize the request size from a new peer.
  2. Add a tracker for each peer to adjust request size according to network conditions between the peer and local node. capacity = 0.9(t.capacity) + 0.1 return blobs/second (Return blob count/Return time in second)
  3. When selecting an idle peer to send a new request, order idle peers by their capacities, and select the peer with the biggest capacity.

It would be great to provide the reference code link for Geth so that reviewers can compare it. Additionally, as mentioned in the ACD meeting, it would be helpful to list how Prysm implements it as well.

qzhodl commented 5 months ago

What about Prysm?

qzhodl commented 5 months ago

I feel the PR comment needs to describe how we tested it.

qzhodl commented 5 months ago

Issue: #232

Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.

  1. Change the p2p.max.request.size to p2p.request.size which will be used to initialize the request size from a new peer.
  2. Add a tracker for each peer to adjust request size according to network conditions between the peer and local node. capacity = 0.9(t.capacity) + 0.1 return blobs/second (Return blob count/Return time in second)
  3. When selecting an idle peer to send a new request, order idle peers by their capacities, and select the peer with the biggest capacity.

This design refers to rates design in "github.com\ethereum\go-ethereum\eth\protocols\snap\sync.go" as the following: 1716396243340

A similar design also exists in prysm, it uses both capacity and score (processedBatches * 0.1) to sort, and also adds some randomness to select that peer or not. 1716685393539 1716685393555 1716685393570 1716691956365 1716691956385

How To Test Change to the log level to debug, or change the following code in the tracker.go to info. Then run the es-node from the begining. log.Debug("Update tracker", "peer id", t.peerID, "elapsed", elapsed, "items", items, "old capacity", oldcap, "capacity", t.capacity)

then check the log like the following: t=2024-05-25T11:46:19+0000 lvl=info msg="Update tracker" "peer id"=16Uiu2HAmGAyykt2njnJYTSU9KsiFutrQKZA1w8LhS45ERpqxfwFV elapsed=333.787809ms items=8,388,608 "old capacity"=39724207.481 capacity=38264942.629

I think a valid test result would be: if the two nodes have a poor internet connection (e.g., China and ax101), the tracker’s capacity would quickly adapt to a very small value, and vice versa.