canonical / sqlair

Friendly type mapping for SQL databases
Apache License 2.0
17 stars 9 forks source link

Support for Get with a partial list of values #106

Closed letFunny closed 1 year ago

letFunny commented 1 year ago

Adds support for calling query.Get() with a partial list of values, not necessarily including all the types which were specified when preparing the statement.

letFunny commented 1 year ago

The rationale for this PR is to be more flexible with the data retrieval. Right now, if the user wants to only get a piece of the data of an existing prepared statement, he has to either create dummy variables himself for all the others, or create another statement with only the necessary data. After this PR, we can specify directly only the fields that we want.

The disadvantage is that we will no longer report an error when the user forgets to provide one of the data variables. I think it is fair trade-off as it will be akin to forgetting to update a value in your code and it is not the responsibility of the language/libraries to correct your logic.

letFunny commented 1 year ago

I disagree with the philosophy of this. It should be an error to not use your SQL statement correctly, and developers need to understand when they've used it wrong, rather than silently omitting it.

its the same problem in, eg, python:

a, b = func_returning_two()
a, b = func_returrning_three() # error incorrect assignment

If people don't care about the object they should a) Rewrite the query. Its doing work that it doesn't need to be doing, and if you want something different, that's ok, but is your responsibility. b) You can pass the argument, and throw it away (akin to):

a, b, _ = func_returning_three()

But it should be a visble 'I'm ignoring this' not a silent library didn't tell you.

I agree with your point and I think it nicely expands on what I have written above about the disadvantages of this change. Let me try to quote your examples to explain my reasoning better than what I did previously.

Firstly, regarding ergonomics, I think the fair comparison would be against Python optional arguments which are similar to what we are using here, a variadic list which is not type checked:

def func(*args, **kwargs):
     # Either use defaults for missing arguments or raise an error.

and because Python is a much more expressive language you can do things such as:

func(first=1, third=3) # omit second argument

but in SQLair and Go in order to omit an argument you have to do:

var arg1 type1
var arg2 type2
....
var argN typeN

q.Get(&arg1, &arg2, ..., &argN)

and discard the variables, which is much more verbose and produces less clear code.

Apart from ergonomics, I think there is a valid use case for having partial list of values. Let's take the following query as an example:

SELECT &Time.*, &Error.* FROM aggregated_log WHERE very_complex_condition(...)

Let's say that sometimes we care about the error and timestamps and other times we only care about the errors themselves. So, naturally, we could split this into two queries:

SELECT &Time.*, &Error.* FROM aggregated_log WHERE very_complex_condition(...)
SELECT &Error.* FROM aggregated_log WHERE copy_of_very_complex_condition(...)

which as you correctly state is more performant and it is clearer what you are asking for in each query. The problem I see with this approach is that, for some use cases, the overhead of having to keep very_complex_condition and copy_of_very_complex_condition synchronized and the subtle bugs this introduces, may be greater than the performance penalty of asking for the extra columns. That is why we wanted to go for the flexibility of having the user decide what is the approach that makes the most sense for their concrete use case, even if in the vast majority of instances they split the queries as you correctly point out.

Lastly, regarding the following statement

It should be an error to not use your SQL statement correctly

I think again you are right and that would be the ideal case. However, with SQLair we are taking the approach of being a very lightweight layer which is as flexible as possible to allow the greatest amount of syntax out of the box. That is why we do not parse the SQL directly and why we do not report errors but let the queries fail later. In that same spirit, I think this change adds the same flexibility and makes the user responsible for using the API correctly in the same manner that he is responsible for the logic in his code (i.e. updating a value where it should be updated).

jameinel commented 1 year ago

So I agree with partially defaulted input parameters. But here you are dealing with "optional" output parameters, and I think there is a significant difference between that.

While there can be some examples where it isn't so bad to not take all the outputs. I would argue that 90% of the time if you didn't parse all the output it was by mistake. And I'd rather have people pass in a no-op for that 10% of the time than all the times where you have hard to understand bugs because the library silently didn't actually set the value that you thought it was. You had created the variable, and even used it later in your code, but forgot to pass it in to actually receive the value.

I really do prefer the "a, b, _ =" with an explicit "I'm ignoring the 3rd parameter". I realize it is a bit trickier because you have to pass in something that matches the type (because otherwise who needs it).

Is this something that had a real consumer that really needed a complex query reused that really wanted the SQL to evaluate that extra content but throw it away? (because if your query really is that complex, then you are potentially reading a lot of rows to merge them/etc just to throw away that data when you were done.)

On Thu, Sep 21, 2023 at 1:33 PM Alberto Carretero @.***> wrote:

I disagree with the philosophy of this. It should be an error to not use your SQL statement correctly, and developers need to understand when they've used it wrong, rather than silently omitting it.

its the same problem in, eg, python:

a, b = func_returning_two() a, b = func_returrning_three() # error incorrect assignment

If people don't care about the object they should a) Rewrite the query. Its doing work that it doesn't need to be doing, and if you want something different, that's ok, but is your responsibility. b) You can pass the argument, and throw it away (akin to):

a, b, _ = func_returning_three()

But it should be a visble 'I'm ignoring this' not a silent library didn't tell you.

I agree with your point and I think it nicely expands on what I have written above about the disadvantages of this change. Let me try to quote your examples to explain my reasoning better than what I did previously.

Firstly, regarding ergonomics, I think the fair comparison would be against Python optional arguments which are similar to what we are using here, a variadic list which is not type checked:

def func(*args, **kwargs):

Either use defaults for missing arguments or raise an error.

and because Python is a much more expressive language you can do things such as:

func(first=1, third=3) # omit second argument

but in SQLair and Go in order to omit an argument you have to do:

var arg1 type1var arg2 type2....var argN typeN q.Get(&arg1, &arg2, ..., &argN)

and discard the variables, which is much more verbose and produces less clear code.

Apart from ergonomics, I think there is a valid use case for having partial list of values. Let's take the following query as an example:

SELECT &Time., &Error. FROM aggregated_log WHERE very_complex_condition(...)

Let's say that sometimes we care about the error and timestamps and other times we only care about the errors themselves. So, naturally, we could split this into two queries:

SELECT &Time., &Error. FROM aggregated_log WHERE very_complex_condition(...)SELECT &Error.* FROM aggregated_log WHERE copy_of_very_complex_condition(...)

which as you correctly state is more performant and it is clearer what you are asking for in each query. The problem I see with this approach is that, for some use cases, the overhead of having to keep very_complex_condition and copy_of_very_complex_condition synchronized and the subtle bugs this introduces, may be greater than the performance penalty of asking for the extra columns. That is why we wanted to go for the flexibility of having the user decide what is the approach that makes the most sense for their concrete use case, even if in the vast majority of instances they split the queries as you correctly point out.

Lastly, regarding the following statement

It should be an error to not use your SQL statement correctly

I think again you are right and that would be the ideal case. However, with SQLair we are taking the approach of being a very lightweight layer which is as flexible as possible to allow the greatest amount of syntax out of the box. That is why we do not parse the SQL directly and why we do not report errors but let the queries fail later. In that same spirit, I think this change adds the same flexibility and makes the user responsible for using the API correctly in the same manner that he is responsible for the logic in his code (i.e. updating a value where it should be updated).

— Reply to this email directly, view it on GitHub https://github.com/canonical/sqlair/pull/106#issuecomment-1730016402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRQ7M6LS3QGKSEIGTQI5TX3R26TANCNFSM6AAAAAA5BDLNQA . You are receiving this because you commented.Message ID: @.***>

niemeyer commented 1 year ago

I agree with John: the philosophy behind this isn't clear. We dont' want people fetching 10MB of data to pick up one byte out of it, to give you an extreme example, and an API that encourages you to do that silently while pretending to be friendly is problematic.

The simple answer for now is to have an actual statment that gets the one value being looked, and use that instead. We can revisit that in the future, but for now I'm pretty sure we don't want to open that door without more experience.

letFunny commented 1 year ago

Thanks Gustavo and John for the comments above, I see your point in the API being explicit. I am closing this PR and we can revisit it in the future.