Query-Based Typechecker

UBOdin / mimir

Data-ish exploration through SQL+Uncertainty

Apache License 2.0

27 stars 13 forks source link

Adaptive Schemas let us dynamically create system catalog tables. For now, these tables are more/less hardcoded: A particular row in a schema table is uncertain if the corresponding adaptive schema introduces uncertainty into it.

For example, take the output of a typical LOAD operation, which creates 3 views:

view_dh: For detect headers
view_ti: For type inference
view: For user-friendly interactions

If we do a SELECT * FROM SYS_ATTRS, we'll see uncertainty on the names of the columns in view_dh, and uncertainty on the types of the columns in view_ti, but the schema of view is fixed. This is outright wrong.

This is, unfortunately, the result of having to translate db.typechecker.schemaOf into a table that the adaptive schemas can reason about, since we don't have a way of querying for these schemas directly. The aim here is to build a typechecker that operates over schema tables. Obviously this is going to be less efficient than just running the regular typechecker, but it allows us to do a few useful things:

We can propagate uncertainty through SYS_ATTRS
We can combine it with POSSIBLY EXISTS queries to figure out if a particular attribute could possibly exist.

To be precise, the aim here is to write a function TypecheckQuery.compile: Operator => Operator The input to the function is a normal query. The output of the function is a query with schema (ATTR_NAME, ATTR_TYPE). Assuming a fully deterministic input query, this function should produce output identical to:

TypecheckQuery.compile(input: Operator) = 
  HardTable(
    Seq(
      ("ATTR_NAME", TString()),
      ("ATTR_TYPE", TType())
    ),
    db.typechecker.schemaOf(input).map { _.toSeq }
  )

Conversely, if the input query runs over one or more adaptive schemas, then this query should produce the same result, but have VGTerm expressions in the right places.

ATTR_NAME	ATTR_TYPE
A	int
B	float
C	string

ATTR_NAME

ATTR_TYPE

int

float

string

A	B	C
int	float	string

int

float

string

UBOdin / mimir

Query-Based Typechecker #288