Closed ping-ke closed 5 months ago
Issue: #232
Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.
- Change the p2p.max.request.size to p2p.request.size which will be used to initialize the request size from a new peer.
- Add a tracker for each peer to adjust request size according to network conditions between the peer and local node. capacity = 0.9(t.capacity) + 0.1 return blobs/second (Return blob count/Return time in second)
- When selecting an idle peer to send a new request, order idle peers by their capacities, and select the peer with the biggest capacity.
It would be great to provide the reference code link for Geth so that reviewers can compare it. Additionally, as mentioned in the ACD meeting, it would be helpful to list how Prysm implements it as well.
What about Prysm?
I feel the PR comment needs to describe how we tested it.
Issue: #232
Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.
- Change the p2p.max.request.size to p2p.request.size which will be used to initialize the request size from a new peer.
- Add a tracker for each peer to adjust request size according to network conditions between the peer and local node. capacity = 0.9(t.capacity) + 0.1 return blobs/second (Return blob count/Return time in second)
- When selecting an idle peer to send a new request, order idle peers by their capacities, and select the peer with the biggest capacity.
This design refers to rates design in "github.com\ethereum\go-ethereum\eth\protocols\snap\sync.go" as the following:
A similar design also exists in prysm, it uses both capacity and score (processedBatches * 0.1) to sort, and also adds some randomness to select that peer or not.
How To Test Change to the log level to debug, or change the following code in the tracker.go to info. Then run the es-node from the begining.
log.Debug("Update tracker", "peer id", t.peerID, "elapsed", elapsed, "items", items, "old capacity", oldcap, "capacity", t.capacity)
then check the log like the following:
t=2024-05-25T11:46:19+0000 lvl=info msg="Update tracker" "peer id"=16Uiu2HAmGAyykt2njnJYTSU9KsiFutrQKZA1w8LhS45ERpqxfwFV elapsed=333.787809ms items=8,388,608 "old capacity"=39724207.481 capacity=38264942.629
I think a valid test result would be: if the two nodes have a poor internet connection (e.g., China and ax101), the tracker’s capacity would quickly adapt to a very small value, and vice versa.
Issue: https://github.com/ethstorage/es-node/issues/232
Currently, we use p2p.max.request.size to control the size of blobs fetched from peers. However, different peers are distributed in different regions, so the request sizes between the local node and peers should be different. So we should use different request sizes to fetch blobs from different peers.
This design refers to rates design in "github.com\ethereum\go-ethereum\eth\protocols\snap\sync.go" as the following:
A similar design also exists in prysm, it uses both capacity and score (processedBatches * 0.1) to sort, and also adds some randomness to select that peer or not.
How To Test Change to the log level to debug, or change the following code in the tracker.go to info. Then run the es-node from the begining.
log.Debug("Update tracker", "peer id", t.peerID, "elapsed", elapsed, "items", items, "old capacity", oldcap, "capacity", t.capacity)
then check the log like the following:
t=2024-05-25T11:46:19+0000 lvl=info msg="Update tracker" "peer id"=16Uiu2HAmGAyykt2njnJYTSU9KsiFutrQKZA1w8LhS45ERpqxfwFV elapsed=333.787809ms items=8,388,608 "old capacity"=39724207.481 capacity=38264942.629
The following is the test result for this feature between local node and peer on AX101. The inital request size is 8M, and after 3 minutes, the request size stable between 4.5M ~ 5.2M