crystal-lang / crystal-sqlite3

SQLite3 bindings for Crystal
https://crystaldoc.info/github/crystal-lang/crystal-sqlite3
MIT License
139 stars 30 forks source link

Automatically set PRAGMAs using connection query params #85

Closed luislavena closed 1 year ago

luislavena commented 1 year ago

Introduce the flexibility to adjust certain PRAGMAs of the SQLite3 connection without having to hardcode those in your codebase (and wait for compilation).

This allows applications to use DATABASE_URL to dynamically fine tune their SQLite3 configuration.

The change complements #setup_connection that offers, via code, the option to perform queries on setup of each connection.

Only a few PRAGMAs necessary to allow more performant concurrent reads and reduce write locking.

These pragmas are detected and combined in a single SQL string to reduce to 1 the number of calls to sqlite3_exec C function.

There is no validation of supplied values as SQLite3 automatically ignores incorrect values for these pragmas.

Closes #84

References:

luislavena commented 1 year ago

Hello @bcardiff, didn't think it fully as I was following a bit what mattn/go-sqlite3 did, but also to differentiate those from connection pool options (Eg. initial_pool_size, max_pool_size, retry_attempts, etc), but I can change it to be exactly as the pragmas, which could simplify the code (and remove the case statement):

ALLOWED_PRAGMAS = {"busy_timeout", "cache_size", "foreign_keys", "journal_mode",  ...}
# ...

URI::Params.parse(query) do |key, value|
  next unless key.in?(ALLOWED_PRAGMAS)

  pragmas[key] = value
end

re: pragmas Hash, I wanted to avoid nesting too deep String.build, HTTP::Params.parse and case key, but I could change if we want to avoid that 😊

I also did that in case someone had the same pragma multiple times in the query parameters and that resulted in multiple SQL queries 😁

Thank you for your promptly response and let me know about the other changes. ❤️ ❤️ ❤️

bcardiff commented 1 year ago

Starting with the hash is fine, no need to change.

In crystal-mysql we recently added an encoding argument, but without any prefix as _.

I'm not sure what would be a good convention going forward. Maybe we should differentiate with _, x_, {driver}_ when the querystring represent a param of the specific driver. Another alternative for this particular case is that maybe pragma_{name} is a good pattern for all supported pragmas.

If there is no clear better alternative we can go ahead and keep _.

luislavena commented 1 year ago

Hello @bcardiff, I had some time to think about the PR, some notes:

  1. Prepending {driver}_, x_, or _ (my original code) seems to me too verbose. At the end you're already using the driver, specified by the scheme of the URI.
  2. On the specific case of pragma_, there are only a few of the pragmas that could be used via the C API (busy_timeout being the one I can think of).
  3. On my original assessment of URI#query, if the developer is already tweaking the connection pool options, query will be present and the Hash with the options will be allocated, even if there is no pragma. This will happen on every new connection being created, so a bigger pool, more pragma hashes will be allocated and then GC'ed.
  4. On my assessment of case versus simplified in?, while the former being verbose, it is also faster and could allow us better treatment of options in the future if needed (right now there is no sanitization as SQLite3 will ignore unknown values)

If is OK with you, I would suggest:

  1. Drop the _ to make it transparent that busy_timeout option is busy_timeout PRAGMA, nothing to learn or map in your head.
  2. Keep the hash that collects set pragmas, that way can combine multiple exec calls and only do once per pragma (last value wins)
  3. Keep the case, if we want to introduce validation of the provided values we could adjust it that way more easily.

Let me know what you think of this plan and I will introduce the changes.

Thank you in advance for your time!

❤️ ❤️ ❤️

bcardiff commented 1 year ago

Drop the _ to make it transparent that busy_timeout option is busy_timeout PRAGMA, nothing to learn or map in your head.

Sounds good. My motivation to have some prefix is that we could validate the options names and instead of ignoring when they don't match a known pragma we could tell the user that a typo was made. So far we've not been doing that because we add one option at a time and it didn't occur to me. So again, sounds good to drop the prefix since we are not going to start validating all the other options probably.

Keep the hash that collects set pragmas, that way can combine multiple exec calls and only do once per pragma (last value wins)

Unless we have a shared memory cache I don't see how this can be done. crystal-db passes only the connection string. Either way it sound like an early optimization to me. If anything, I would start by not using a hash.

Keep the case, if we want to introduce validation of the provided values we could adjust it that way more easily.

These would be validation of values, not keys right? I think we can defer for another PR if needed.


It seems that the main thing to decide is whether or not to keep the prefix. I'm fine dropping them if you feel they are too verbose.

bcardiff commented 1 year ago

Btw, to avoid the hash in a neat way I was thinking of

require "uri"

query = "foo=1&bar=first&bar=second&quz=ignore"
params = extract(query, foo: nil, bar: nil)

puts typeof(params) # => NamedTuple(foo: String | Nil, bar: String | Nil)
puts params         # => {foo: "1", bar: "second"}

def extract(query, **default : **T) forall T
  res = default

  URI::Params.parse(query) do |key, value|
    {% begin %}
      case key
      {% for key in T %}
      when {{ key.stringify }}
        res = res.merge({{key.id}}: value)
      {% end %}
      end
    {% end %}
  end  

  res
end

some copy of the values could be avoided, but it does the work.

luislavena commented 1 year ago

Hello @bcardiff, thanks for your patience coming back to this.

Unless we have a shared memory cache I don't see how this can be done. crystal-db passes only the connection string. Either way it sound like an early optimization to me. If anything, I would start by not using a hash.

Apologies for not making it clear, thought it was but re-reading my response seems didn't explain it.

The reason I was using the Hash was so under certain scenarios with double variables in the URI (Eg. ?foo=10&foo=20), the last one detected from the URI should be the definitive value. I've seen this in a few cases as quick overrides for testing before changing a long URI of configuration options 😓

This was done to avoid having the same PRAGMA multiple times when performing the call to SQLite3. Very naive approach that looking at your suggestion clearly addresses.

On that subject, your macro approach uses less memory and performs a bit faster than the Hash approach:

 extract (hash) 857.68k (  1.17µs) (± 1.28%)  721B/op   1.17× slower
extract (macro)   1.00M (995.58ns) (± 1.45%)  550B/op        fastest
Source code ```crystal require "benchmark" require "uri" def extract(query, **default : **T) forall T res = default URI::Params.parse(query) do |key, value| {% begin %} case key {% for key in T %} when {{ key.stringify }} res = res.merge({{key.id}}: value) {% end %} end {% end %} end res end def hash_extract(query) res = Hash(String, String).new URI::Params.parse(query) do |key, value| case key when "busy_timeout" res["busy_timeout"] = value when "cache_size" res["cache_size"] = value when "foreign_keys" res["foreign_keys"] = value when "journal_mode" res["journal_mode"] = value end end res end params = "busy_timeout=1000&garbage=foo&cache_size=-1000&foreign_keys=1&journal_mode=wal&busy_timeout=5000" Benchmark.ips do |x| x.report("extract (hash)") { hash_extract(params) } x.report("extract (macro)") { extract(params, busy_timeout: nil, cache_size: nil, foreign_keys: nil, journal_mode: nil, synchronous: nil, wal_autocheckpoint: nil, ) } end ```

As for the validation of input, we might want to introduce something that provides that support part of crystal-db for others to use, but I would say we can talk about that in a different PR. 😉

I decided to push additional changes to:

  1. Remove the underscore as prefix
  2. Use your macro approach

I hope is all good.

Thank you again for your time and your suggestions. ❤️ ❤️ ❤️

luislavena commented 1 year ago

One minor fix to the readme and we are good to go. I will commit those directly and wait a bit before merging.

Ah, great catch! missed when I was doing the search and replace! 🤦🏽

Thank you! ❤️ ❤️ ❤️