The solution is to employ some query similarity metric between the seen queries and the queries in the workload. This would allow us to maintain a fixed routing function, which is whatever is computed by RITA.
The question is how can we design a good similarity function. The idea is to use INUM again to our advantage.
For a query q we compute a vector v_q that has dimensionality x_1+…+x_n, where x_j is the number of indexes materialized at replica j. Let i be an index in in [1,x_1+…+x_n] that corresponds to index i at replica j. Then v_q[i]=1 if i is used in the optimal plan of q at replica j, and 0 otherwise.
Given two queries q and q', we compute their similarity as the similarity of the binary vectors v_q and v_q', using the following intuition: q and q' are similar if they benefit from the same indexes in different replicas. Maybe we can refine this further, e.g., instead of taking into account just the optimal plan, we take into account the top-2 plans for each query.
The solution is to employ some query similarity metric between the seen queries and the queries in the workload. This would allow us to maintain a fixed routing function, which is whatever is computed by RITA.
The question is how can we design a good similarity function. The idea is to use INUM again to our advantage.
For a query q we compute a vector v_q that has dimensionality x_1+…+x_n, where x_j is the number of indexes materialized at replica j. Let i be an index in in [1,x_1+…+x_n] that corresponds to index i at replica j. Then v_q[i]=1 if i is used in the optimal plan of q at replica j, and 0 otherwise.
Given two queries q and q', we compute their similarity as the similarity of the binary vectors v_q and v_q', using the following intuition: q and q' are similar if they benefit from the same indexes in different replicas. Maybe we can refine this further, e.g., instead of taking into account just the optimal plan, we take into account the top-2 plans for each query.