Closed AmeanAsad closed 1 year ago
Hey @willscott noticing that the
ConsiderUpdateWeightedNode
function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.
you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?
@willscott
you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?
I think it would be helpful to just get an idea of the expected behavior here to understand what Im testing against. The example im going off is lets say I have a new node that is not currently in the hash ring and I run ConsiderUpdateWeightedNode("nodeUrl", 1)
, what do I expect to receive?
I would expect ConsiderUpdateWeightedNode
for a new node to return a negative value - you're taking part of the hash ring that would currently send it's requests to a node with known latency/performance characteristics, and proposing giving that space instead to a node with unknown characteristics. The expected outcome of that is worse.
One way that confidence gets better is if mirroring has already sent some test requests to the new node, so we have some sense that it could be good.
the other question is how to handle initial startup - we probably need to enforce some sort of minimum reasonable size where we're okay with negative updates if we fall below.
@willscott the reason why the tests are passing is because the PoolConsiderationCount
is 30 so all the nodes in the test automatically get added in the pool. I changed that value earlier temporarily for testing with smaller groups of nodes to have easier debugging.
@willscott a few things I found:
The MaybeSubstituteOrAdd
uses the neighbor's Rate()
as part of the delta formula:
neighbor = nr.Nodes[n]
neighborVolume := neighbor.Rate()
// how much worse is candidate?
diff := candidate.Priority() - neighbor.Priority()
delta += diff * neighborVolume * float64(v)
The Rate()
for nodes with no stats is always zero so the first test always fails because the delta will always be zero comparing a node that has had stats with one that has none.
ConsiderUpdateWeightedNode
for the first test is returning negative values when comparing a node in the hashring that has no stats vs a new node that has recorded a bunch of successes. Based on your description above, seems like this is unexpected behavior. @aarshkshah1992 test flakiness addressed.
cc @willscott was able to reproduce flakiness locally and made changes to the topN selection. I believe just using heap.Init
only guarantees sorting of the first element.
@willscott added a test that targets testing cache affinity. Relevant commit is here: https://github.com/filecoin-saturn/caboose/pull/153/commits/1975f49d26b9a2a28234d4e40aa71118acc0c060
The test tries to ensure that if we have a set of good performing nodes that we have previously made cid requests to, we still hit those nodes for those cids even if other "similar" performing nodes join the pool. Currently that test is failing, and we are seeing new nodes that join the pool show up as first candidate nodes for those cids instead.
I'm going to merge this into adaptive so we have one PR tracking all of this and we can get to final merge
Hey @willscott noticing that the
ConsiderUpdateWeightedNode
function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.