digital-fabric / extralite

Ruby on SQLite
http://www.rubydoc.info/gems/extralite
MIT License
255 stars 8 forks source link

Add callback for transforming unhandled binding parameters #33

Closed gschlager closed 11 months ago

gschlager commented 11 months ago

@noteflakes Thanks for inviting me as collaborator. Still, I'd like to get your feedback before I merge something like this PR. Happy to make changes to it. I still need to write tests for prepared statements and update the README.

This will make it easier to bind values like Date or Time by supplying a block that gets executed when you are trying to bind an unhandled parameter type.

require "date"
require "time"

db = Extralite::Database.new(':memory:')
db.on_unhandled_parameter = ->(value) do
  case value
  when Date
    value.iso8601
  when Time
    value.utc.iso8601
  else
    raise "don't know how to handle #{value.class}"
  end
end

db.query_single_value('select ?', Date.new(2023, 12, 24))
db.query_single_value('select ?', Time.new(2023, 12, 24, 20, 30, 0, '+02:00'))
noteflakes commented 11 months ago

The only thing I'm not clear about is the nested flag. What is its purpose?

gschlager commented 11 months ago

The nested flag is a breaking change. Currently it's possible to bind named parameters from nested hashes.

The following works because it binds y and z from the nested hash.

db.query('SELECT :x AS x, :y AS y, :z AS z', { x: 1, more: { y: 2, z: 3 } })
# => [{ x: 1, y: 2, z: 3 }]

I think that's an unexpected and undocumented behavior. I'd prefer if the previous code raises an error because it can't bind the more key. It would be better if developers could define how nested objects are bound.

For example, the following code would bind more as JSON string.

db.on_unhandled_parameter = ->(value) do
  case value
  when Hash
    value.to_json
  else
    raise "don't know how to handle #{value.class}"
  end
end

db.query('SELECT :x AS x, :more AS more', { x: 1, more: { y: 2, z: 3 } })
# => [{ x: 1, more: '{"y":2,"z":3}' }]

The original behavior can still be achieved by slightly changing how parameters are supplied.

db.query('SELECT :x AS x, :y AS y, :z AS z', { x: 1 }, { y: 2, z: 3 })
# => [{ x: 1, y: 2, z: 3 }]

db.query('SELECT :x AS x, :y AS y, :z AS z', { x: 1, y: 2, z: 3 })
# => [{ x: 1, y: 2, z: 3 }]

If we don't want this to be a breaking change, then I guess we could allow binding of nested objects (hash, struct) unless a on_unhandled_parameter block is supplied.

What do you think?

noteflakes commented 11 months ago

I think that's an unexpected and undocumented behavior. I'd prefer if the previous code raises an error because it can't bind the more key. It would be better if developers could define how nested objects are bound.

Indeed that's an unintended behaviour.

For example, the following code would bind more as JSON string. ...

> db.query('SELECT :x AS x, :more AS more', { x: 1, more: { y: 2, z: 3 } })
# => [{ x: 1, more: '{"y":2,"z":3}' }]

Personally I prefer being more explicit at the callsite, i.e. not using on_unhandled_parameter:

> db.query('SELECT :x AS x, :more AS more', { x: 1, more: { y: 2, z: 3 }.to_json })
> # => [{ x: 1, more: '{"y":2,"z":3}' }]

The original behavior can still be achieved by slightly changing how parameters are supplied.

I don't mind breaking the behaviour for nested hashes.

noteflakes commented 11 months ago

Thinking about it some more, I think it would be beneficial if we came up with more use cases. The JSON one is to me not so interesting, as it can be done very easily at the callsite (see my previous comment).

I think a more interesting use case is where you use a custom object to supply parameters, for example an ORM instance:

foo_instance = Foo.new(x: 1, y: 2, z: 3)
db.query('insert into foo values (:x, :y, :z)', foo_instance)

To make this possible we can do one of the following:

  1. Keep your approach:
db.on_unhandled_parameter { |v| v.is_a?(Foo) ? foo.values : (raise "Invalid parameter") }
  1. Automatically call #to_h on the object:
    class Foo
    def to_h
    @values
    end
    end

The first approach is more flexible. The second approach is simpler and relies on a well-known standard Ruby API that should already exist on container objects. There's pros and cons for each approach. But maybe before deciding which approach is better for us we should come up with more use cases. What do you think?

gschlager commented 11 months ago

My main use case are Date and Time objects as shown in my PR description. Currently I have many places were data is generated and I don't want format the dates and times as strings in all those places. The requirement to convert them into a different data type (TEXT, INTEGER or REAL) seems like an implementation detail that should be hidden from the caller. So, my choices are to either add an intermediate layer that does the transformation or to transform the values when the parameters are bound. I prefer the latter.

Calling to_h on objects that implement it seems like a good default.

noteflakes commented 11 months ago

Calling to_h on objects that implement it seems like a good default.

I'm working on code to do that. Will push momentarily.

noteflakes commented 11 months ago

I've also added tests for nested hashes and structs, which at the moment fail, as they expect a ParameterError to be raised. If nested hashes are not supported, don't we need to raise an exception?

gschlager commented 11 months ago

Thanks for your feedback and your code. I'll have a closer look tomorrow.

noteflakes commented 11 months ago

I think the main problem here is that if you pass a hash as a parameter, it can be interpreted as either an object that needs to be transformed in order to be used as a positional parameter, or as a set of named parameters:

hash = { foo: 1, bar: 2 }
db.query('select ?', hash) # hash should be converted using unhandled_parameter_proc
db.query('select :foo, :bar', hash) # hash treated as bunch of named parameters

In order to remove this ambiguity, we can do a simple heuristic where we scan the sql for ?, which means that parameters are treated as positional. Otherwise they are treated as named. Then again, this introduces a certain complexity and may lead to subtle bugs for users of the library that will be hard to understand.

Here's an idea that might solve the problems we're encountering with this PR:

Add a way to transform all parameters

Instead of transforming just the parameters Extralite doesn't recognize, provide an API to transform all of them:

db.parameter_transform do |v|
  case v
  when Time
    v.to_i
  when Hash
    v.to_json
  else
    v
  end
end

This proc is then fed whatever is passed to Database#query:

db.query_single_value('select ?', Time.now) #=> 1703195102
db.query_single_value('select ?', foo: 1) #=> '{ "foo": 1 }'
db.query_single_value('select ?', 'foo') #=> 'foo'

When defined, the result of the parameter_transform proc is bound normally as is done now, including support for hashes for expressing named parameters. Note that in the example above you can't use named parameters, since the given hash is always transformed to JSON:

db.query_single_value('select :foo', foo: 1) #=> '{ "foo": 1 }'

But the following is possible:

db.parameter_transform do |v|
  case v
  when Time
    v.to_i
  else
    v # if a hash is returned it is interpreted as a set of named parameters
  end
end

db.query_single_value('select ?', Time.now) #=> 1703195102
db.query_single_value('select :foo', foo: 1) #=> 1

Some possible solutions to mixing together named parameters and transforming hashes to JSON:

I know this is not ideal either, but it seems to me a much simpler solution that has a more consistent behaviour. I'll try to put together a separate PR for this idea.

gschlager commented 11 months ago

I was trying to avoid calling a proc for every parameter for performance reasons, but I didn't benchmark it, so maybe it's not as bad as I thought it is. I guess we can give it a try.

gschlager commented 11 months ago

I'm closing this. It's not worth the trouble and complexity. I'm probably going to implement it in a lightweight DB layer for my usecase. The less complex alternative in https://github.com/digital-fabric/extralite/pull/45 adds a performance penaltiy to every parameter binding which is even worse for performance.