filecoin-saturn / caboose

A blockstore for distributing load
Other
12 stars 1 forks source link

feat: modify pool tests #153

Closed AmeanAsad closed 11 months ago

AmeanAsad commented 1 year ago

Hey @willscott noticing that the ConsiderUpdateWeightedNode function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.

willscott commented 1 year ago

Hey @willscott noticing that the ConsiderUpdateWeightedNode function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.

you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?

AmeanAsad commented 1 year ago

@willscott

you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?

I think it would be helpful to just get an idea of the expected behavior here to understand what Im testing against. The example im going off is lets say I have a new node that is not currently in the hash ring and I run ConsiderUpdateWeightedNode("nodeUrl", 1), what do I expect to receive?

willscott commented 1 year ago

I would expect ConsiderUpdateWeightedNode for a new node to return a negative value - you're taking part of the hash ring that would currently send it's requests to a node with known latency/performance characteristics, and proposing giving that space instead to a node with unknown characteristics. The expected outcome of that is worse.

One way that confidence gets better is if mirroring has already sent some test requests to the new node, so we have some sense that it could be good.

the other question is how to handle initial startup - we probably need to enforce some sort of minimum reasonable size where we're okay with negative updates if we fall below.

AmeanAsad commented 1 year ago

@willscott the reason why the tests are passing is because the PoolConsiderationCount is 30 so all the nodes in the test automatically get added in the pool. I changed that value earlier temporarily for testing with smaller groups of nodes to have easier debugging.

AmeanAsad commented 1 year ago

@willscott a few things I found:

The Rate() for nodes with no stats is always zero so the first test always fails because the delta will always be zero comparing a node that has had stats with one that has none.

AmeanAsad commented 1 year ago

@aarshkshah1992 test flakiness addressed.

cc @willscott was able to reproduce flakiness locally and made changes to the topN selection. I believe just using heap.Init only guarantees sorting of the first element.

AmeanAsad commented 1 year ago

@willscott added a test that targets testing cache affinity. Relevant commit is here: https://github.com/filecoin-saturn/caboose/pull/153/commits/1975f49d26b9a2a28234d4e40aa71118acc0c060

The test tries to ensure that if we have a set of good performing nodes that we have previously made cid requests to, we still hit those nodes for those cids even if other "similar" performing nodes join the pool. Currently that test is failing, and we are seeing new nodes that join the pool show up as first candidate nodes for those cids instead.

willscott commented 11 months ago

I'm going to merge this into adaptive so we have one PR tracking all of this and we can get to final merge