Open leventov opened 5 years ago
@leventov why would you choose to use reservoir choice of a server as opposed to randomly choosing a server since we already know the size of servers
? Given this random server
, we could then choose a random segment using the hack you've shown (very cool btw!).
I've put up https://github.com/apache/incubator-druid/pull/7174 with the approach that you suggested, but chose a server at random instead of using reservoir sampling (I can change this if required)
@shivtools the previous implementation was choosing a segment uniformly at random in a cluster. If you just choose a random server first, segments that are served by less populated servers have a higher chance of being chosen overall. Or am I wrong?
Hey @leventov, you're right! I tuned the server selection to use reservoir sampling as you suggested.
The current algorithm is
O(N_SEGMENTS_IN_CLUSTER)
, it can beO(N_SERVERS_IN_CLUSTER) + O(log(N_MAX_SEGMENTS_ON_A_SERVER)
via reservoir choice of a server first using populations of servers, then a random segment can be chosen on a server using this hack.