Open okennedy opened 6 years ago
@mrb24 @willspoth
I'm implementing this at the moment. It's going reasonably well, but I'm running into one conceptual hurdle that I wanted to bounce off of y'all, since it involves messing with the AdaptiveSchema interface.
The question concerns the orientation of the schema table. For example, let's say we have a table R(A int, B float, C string). At present, this is represented in adaptive schemas as:
ATTR_NAME | ATTR_TYPE |
---|---|
A | int |
B | float |
C | string |
Let's call this "Vertical" orientation. There's also the possibility of representing it in a "Horizontal" orientation
A | B | C |
---|---|---|
int | float | string |
Horizontal is far easier to work with in the typechecker, since it mirrors the natural behavior of columns. It creates simpler queries, and probably more efficient ones too. The former, however, is what SYS_ATTRS
expects to see (and there's not really any way around that... unless we go to JSON or something).
I've created a pair of utility functions pivot and unpivot in OperatorConstructors
to help translating back and forth between these structures, but the optimizer is not going to have a clue what to do with the queries that pop out, and we're going to end up with these monolithic queries that are a PITA to work with.
As one way of mitigating this, I plan to create two functions: one for each orientation. The Horizontal orientation will be the native one, and to get the Vertical orientation we'll just compute the Horizontal version and unpivot it.
However, this still means that anytime an adaptive schema element shows up in a query we end up pivoting+unpivoting its schema. This probably deserves its own issue for discussion... but I'm wondering whether it might be useful to rejigger the adaptive schema interface to have an option of producing horizontal schemas as well...
Adaptive Schemas let us dynamically create system catalog tables. For now, these tables are more/less hardcoded: A particular row in a schema table is uncertain if the corresponding adaptive schema introduces uncertainty into it.
For example, take the output of a typical
LOAD
operation, which creates 3 views:If we do a
SELECT * FROM SYS_ATTRS
, we'll see uncertainty on the names of the columns in view_dh, and uncertainty on the types of the columns in view_ti, but the schema ofview
is fixed. This is outright wrong.This is, unfortunately, the result of having to translate db.typechecker.schemaOf into a table that the adaptive schemas can reason about, since we don't have a way of querying for these schemas directly. The aim here is to build a typechecker that operates over schema tables. Obviously this is going to be less efficient than just running the regular typechecker, but it allows us to do a few useful things:
SYS_ATTRS
To be precise, the aim here is to write a function
TypecheckQuery.compile: Operator => Operator
The input to the function is a normal query. The output of the function is a query with schema(ATTR_NAME, ATTR_TYPE)
. Assuming a fully deterministic input query, this function should produce output identical to:Conversely, if the input query runs over one or more adaptive schemas, then this query should produce the same result, but have VGTerm expressions in the right places.