UBOdin / mimir

Data-ish exploration through SQL+Uncertainty
http://mimirdb.info
Apache License 2.0
27 stars 13 forks source link

GROUP BY column names are case-sensitive. #326

Closed okennedy closed 5 years ago

okennedy commented 5 years ago

I want to spend a little more time thinking about the right fix.

The main issue is that un-qualified group-by variables are mapped to tables using the tablesForColumn map, defined here. This lookup is case-sensitive. The actual table lookup takes place here.

In principle, the group-by column list can/should be replaced by a NameLookup-style binding map. It gets used to derive the schema of the input to the HAVING block, as well as any post-aggregate projections. In most of those cases though, bindings should be sufficient. There seems to be some effort to preserve the table names of the group-by columns, but I can't, for the life of me, remember why. I'm thinking it was for consistency in NameLookup use, but if that's all it is, we should be able to remove a bunch of code in SQLToRA and everything should still just work.

This is an after SIGMOD bugfix, imo.

okennedy commented 5 years ago

There seems to be a similar problem with table names in the FROM clause (see the Vizier invasive species test)

okennedy commented 5 years ago

This is a sneaky bug. Name works in some map types, but not others. It seems like the test harness is getting a lighter-weight map that respects .equals, but the compiled version gets one that doesn't...

okennedy commented 5 years ago

The reason we're keeping around table names is that range variables can still be used in later parts of the query. We need to essentially create a projection of the source schema to support queries like:

SELECT R.A FROM R GROUP BY A HAVING R.A > 4

Specifically, we need to have a safer way to dig up the association between A and R in the group by clause.