diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

sql names quoting #213

Closed AndreiBozantan closed 4 years ago

AndreiBozantan commented 4 years ago

Using the new DSqlObjectName class for quoting names in SQL queries.

I didn't end up using the ICustomFormatter (https://thomaslevesque.com/2015/02/24/customizing-string-interpolation-in-c-6/) because we include strings with multiple semantics in our formatted queries: SQL object names, string values, some SQL identifiers.

dandanlen commented 4 years ago

I was thinking of having a custom formatter that we can use on-demand to escape just the values we want using a label like Quote or Escape.

For example we could use it like:

var queryStatement = $@"
    select {column:Quote} 
    from {table:Quote} 
    where {column:Quote} between {lower} and {upper}"

I think this is less invasive than wrapping every value in a new type - we only need to escape it in the actual query, not everywhere it's used.

AndreiBozantan commented 4 years ago

The current approach has some benefits:

My first approach to this (I didn't commit it) was to use a static method with a using static ... statement, which will give us something like below, without a custom formatter:

var queryStatement = $@"
    select {Q(column)} 
    from {Q(table)}
    where {Q(column)} between {lower} and {upper}"

But I quickly realized that this involves a lot of error prone work: how do I make sure that I modified all the necessary places?

dandanlen commented 4 years ago

I suppose what bugs me about this solution is that we have to specify everywhere in the codebase where we use the Column or Table that it's an sql fragment - even though this is only significant when we actually build the sql statement! What happens if we need some other kind of quoting or escaping? Each time we will need to create a new class and replace it everywhere in the codebase, just to have correct sql escaping.

Here is a potential solution using a custom formatter: We can require format specifiers for all of the interpolated fragment. If you throw an exception when there's no format specifier then we get an immediate error at runtime so it should be caught by tests (sadly not the compiler).

Here's an example: https://dotnetfiddle.net/TZVt8s

This would also allow us to insert literal fragments into the sql statement, like whole reusable where clauses. Or even more complex stuff: maybe we could auto-generate a comma-separated list of values from a list of inputs, for example.

Imagine this:

var whereCondition = SqlFormatter.Format("{col1:Quote} > {someVar:Raw}");
var statement = SqlFormatter.Format(@$"
    select {cols:QuotedList} 
    from {table:Quote} 
    where {whereCondition:Raw}
    ");
AndreiBozantan commented 4 years ago

I suppose what bugs me about this solution is that we have to specify everywhere in the codebase where we use the Column or Table that it's an sql fragment - even though this is only significant when we actually build the sql statement.

I think that the new DSqlObjectName type is used just in the right places:

There are another couple of places where we use objects of this type, but without specifying the type, so I don't find the usage excesive.

What happens if we need some other kind of quoting or escaping? Each time we will need to create a new class and replace it everywhere in the codebase, just to have correct sql escaping.

Frankly I didn't think about such scenarios until now, but I don't see a problem with using additional new types if necessary as parameters to query constructors. In fact as I already said, I think that this is beneficial since the compiler will help with the type checks.

We can require format specifiers for all of the interpolated fragment. If you throw an exception when there's no format specifier then we get an immediate error at runtime so it should be caught by tests (sadly not the compiler).

This sounds like a possible solution, but I am not sure that this is better than the current one. I don't like the idea of polluting the SQL statements with additional format specifiers. Even if we check that there is a format specifier, there is no guarantee that I will use the correct one. Also, there is no kind of easy documentation for the format specifiers - I will always have to look up the correct syntax (just like I do for String.Format and printf functions). The current solution provides the benefit of type signatures, for which we have code completion in the IDE, or in the worst case the compiler errors.

AndreiBozantan commented 4 years ago

I will try to come up with a better solution later, after the release.