Open balegas opened 2 months ago
Terminology nit: for me "match single write with millions of shapes" is confusing as I associate "match" with actually matching the row to the shape. I suggest we "evaluate" / "compare" / "filter" the row against millions of shapes but only match on the ones that ... match.
Then a couple of notes:
With writes out of the critical path, the main question there is how big of bursts can we handle and how much sustained write throughout can we handle before we fall apart. We also should be testing various hard drive setups as well to see how much that impacts things.
I would assume that reducing the times a where clause needs to be evaluated to do the matching would be a core aspect of this
Regardless of optimisation we can do to specific types of shapes, we still need the system to be able to handle a certain number of where clauses.
perhaps we can be more specific about the requirement for how performance can acceptably degrade with more matched shapes
That's a good point, but my intention is to say that we can handle a lot more than what people would typically use. We can get to performance degradation in the future
Regardless of optimisation we can do to specific types of shapes, we still need the system to be able to handle a certain number of where clauses.
Sorry, I don’t follow. My point is about the algorithm for efficient matching. Not about the type of shapes.
For example, shape 1 has where a and b. Shape 2 has where b and c. Somehow only run the “where b” part of the check once …
There is a large compute overhead for evaluating shape where clauses.
What are the performance numbers we're looking into? The shape matching algorithm should be able to evaluate a single operation against millions of shapes. The algorithm should work well for cases where a single operation only touches few shapes:
We have a few ideas of optimizations we could do:
We expect apps to not have a lot of shape definitions, but having at least one shape that has millions of instances, e.g. user data shapes. It makes sense to optimize those kinds of shapes if necessary.