Closed richardstartin closed 2 years ago
cc @amrishlal
Working on it
One approach could be:
I think this is fairly simple
While the above will address the general accuracy issues pointed out above associated with picking a random segment on a random server, there is still going to be some inaccuracy in the case where a server physical operator tree is indeed different for a segment. This can be done as another follow-up
One approach could be:
- Send query to all segments and all servers
- On the broker, pick the response with deepest tree and avoid merging across segments
I think this is fairly simple
While the above will address the general accuracy issues pointed out above associated with picking a random segment on a random server, there is still going to be some inaccuracy in the case where a server physical operator tree is indeed different for a segment. This can be done as another follow-up
I think it's misleading to choose just one child, it's not a degenerate case for the to be a mix, where some segments can be skipped over based on metadata and others need to perform some real work. This can also happen if there has been a configuration change and the segments are in a mixed state.
If we want to be 100% accurate, then probably the only option is to move towards full evaluation of EXPLAIN PLAN over all segments on all servers. To start with maybe we could evaluate EXPLAIN PLAN against one segment of each server and do broker reduce and dedup over all the servers. Later on a server combine could be added for evaluating against all segments? For now, although not ideal, the user can execute the EXPLAIN PLAN. a few times to get a better idea of the variation.
I think the deepest child heuristic might be the right trade off between accuracy and whatever it is which prevents considering the entire query execution.
Change is in progress. Will meet with @Jackie-Jiang / @richardstartin offline to share/discuss the approach once
I have a query as follows:
It produces a nonzero result:
However, when I try an explain plan:
The query plan picks a segment at random which should have been pruned, and doesn't reflect the way the query is evaluated:
It would be helpful if the plan chose a segment which has data, or queried all segments and merged operators when the operator varies according to segment.