duct-framework / database.sql.hikaricp

Integrant methods for a SQL database connection pool
5 stars 6 forks source link

Disable query logging #3

Open danielytics opened 4 years ago

danielytics commented 4 years ago

I need to be able to disable the query logging on a per-query basis, as, in general, I like having the logging, but for some query I don't want logging.

The use cases are:

  1. Sometimes queries contain data which you don't want logged, like password hashes or Personally Identifiable Information (especially if logs are forwarded to a third party logging service, it may even not be legal to do so in some cases and jurisdictions)
  2. Sometimes queries contain data which is very large and would cause logs to be excessively large (eg I sometimes need to store binary data or larger text content)

In both cases, only some queries are affected, so it would be good to only disable logging for those, rather than globally. Technically only the affected values need to be masked, but I don't know of any way to do that.

Looking at the code, it looks like I can disable logging globally for all queries by not passing a logger to the component, but I'm not actually sure how -- I haven't explicitly set a logger for the hikaricp component in my duct config but it still seems to get one (maybe I'm missing something), otherwise I could work around this by having two components: one with logging and one without, although then it applies to an entire transaction (plus the awkwardness of having two DB objects in components that need this).

I can open a PR, if you give me your thoughts on this. My initial thought is to introduce a dynamic var which defaults to "log enabled" that can be set, something like: (binding [duct.database.sql.hikaricp/*logging* true] (sql-statement)) although I don't actually know how the proxy datasource works -- is afterQuery called in the same thread a the caller? Alternatively, perhaps the datasource can be temporarily unwrapped.

weavejester commented 4 years ago

What about adding another key to the Boundary record? For example, :unlogged-spec.

danielytics commented 4 years ago

Do you mean duct.database.sql/Boundary alongside spec, or do you mean inside that, alongside datasource? Would this be usable to turn off logging on a per-query basis? I'm not quite sure I understand, or do you mean for turning logging off "globally" for that component?

To answer my question from earlier:

I haven't explicitly set a logger for the hikaricp component in my duct config but it still seems to get one (maybe I'm missing something)

What I was missing is that I'm using module.sql, which sets the logger internally. Are you proposing to use :unlogged-spec as a way around that? That would work, although I was hoping to leave logging enabled and being able to disable it for only the queries that need it.

Although, that got me thinking, what if instead of suppressing the log entirely, you could pass it a format function that gets applied to queries before logging is done. Something like:

(afterQuery [_ exec-info query-infos]
  (let [elapsed (.getElapsedTime exec-info)
        queries (mapv logged-query query-infos)
        format-queries (or query-format-fn identity)]
    (if (= (count queries) 1)
      (log/log logger :info ::sql/query {:query (first (format-queries queries)), :elapsed elapsed})
      (log/log logger :info ::sql/batch-query {:queries (format-queries queries), :elapsed elapsed}))

As a global option (possibly alongside :unlogged-spec) and then leave it up to the user to decide what to do. For example, I can then myself implement a check with binding (assuming that afterQuery is run in the correct thread, I haven't yet tested it) or an atom, or if I happen to know a regex pattern for fields I want to mask, or if I want to truncate large fields, I can do that too.

What do you think?

weavejester commented 4 years ago

Do you mean duct.database.sql/Boundary alongside spec

Yep.

Are you proposing to use :unlogged-spec as a way around that? That would work, although I was hoping to leave logging enabled and being able to disable it for only the queries that need it.

Yes, that's what you should be able to do. Your functions should be written around the boundary, so when you want logging you use :spec, and when you don't want logging you use :unlogged-spec.

(extend-protocol Foobar
  sql/Boundary
  (get-foo [{:keys [spec]} id]
    (db/q spec ...))
  (get-bar [{:keys [unlogged-spec]} id]
    (db/q unlogged-spec ...)))
danielytics commented 4 years ago

Ah, I see, I understand now, thank you. I'll open a PR (either later tonight or tomorrow).

I still like the idea of being able to format the query vectors before logging, since it means I wouldn't need to give up logging completely for a query when I know what I want to truncate or mask, but that can be something that's addressed at a later time. So maybe you can keep it in mind and think about it? (Let me know if you want a PR) unlogged-spec fixes my immediate needs, so its not high priority.

Thanks for the help. I will leave this issue open until I've got a PR.

weavejester commented 4 years ago

Would the format query function just be there to format how the query appears in the logs? If so, I don't think that's a bad idea.

danielytics commented 4 years ago

Yes, I think just to preprocess/filter the query vector right before logging:

(log/log logger :info ::sql/query {:query (first (format-queries queries)), :elapsed elapsed})

For example, something like:

(defn my-query-formatter [query]
  (into [(first query)]
    (map #(if (> (count %) 50) "...truncated..." %) (rest query))))
:duct.database.sql/query {:query ["insert into foo (text) values (?);","...truncated..."], :elapsed 1}

Something like that anyway.

danielytics commented 4 years ago

I'm trying to see how to implement :unlogged-spec. Are you thinking something like this:

duct.database.sql:

(defrecord Boundary [spec unlogged-spec])

(defmethod ig/init-key :duct.database/sql [_ spec]
  (->Boundary spec nil))

duct.database.sql.hikaricp:

(defmethod ig/init-key :duct.database.sql/hikaricp
  [_ {:keys [logger connection-uri jdbc-url] :as options}]
  (let [datasource (-> (dissoc options :logger)
                       (assoc :jdbc-url (or jdbc-url connection-uri))
                       (hikari-cp/make-datasource))]
    (sql/->Boundary
      {:datasource (cond-> datasource logger (wrap-logger logger))}
      {:datasource datasource})))

Or perhaps leave Boundary unchanged and just assoc the unlogged spec to the hikaricp boundary record. Maybe something like:

(defmethod ig/init-key :duct.database.sql/hikaricp
  [_ {:keys [logger connection-uri jdbc-url unlogged-spec?] :as options}]
  (let [datasource (-> (dissoc options :logger)
                       (assoc :jdbc-url (or jdbc-url connection-uri))
                       (hikari-cp/make-datasource))]
    (cond-> (sql/->Boundary {:datasource (cond-> datasource logger (wrap-logger logger))})
      unlogged-spec? (assoc :unlogged-spec {:datasource datasource}))))
weavejester commented 4 years ago

The latter option - leave Boundary unchanged and add it in as an extra key. Not all SQL databases will have a logged/unlogged option after all.

You can also just add the :unlogged-spec key in without checking an optioon, as it doesn't make the call more expensive:

(defmethod ig/init-key :duct.database.sql/hikaricp
  [_ {:keys [logger connection-uri jdbc-url] :as options}]
  (let [datasource (-> (dissoc options :logger)
                       (assoc :jdbc-url (or jdbc-url connection-uri))
                       (hikari-cp/make-datasource))]
    (if logger
      (-> (sql/->Boundary {:datasource (wrap-logger datasource logger})
          (assoc :unlogged-spec {:datasource datasource}))
      (sql/->Boundary {:datasource datasource}))))
arichiardi commented 3 years ago

Hi there, I stopped by to say that the logging is useful, I would probably make the logging level configurable so that you can decide via other means if you want to show it or not...any thought?

Would that be a good PR to open in addition to this one?

weavejester commented 3 years ago

Hi there, I stopped by to say that the logging is useful, I would probably make the logging level configurable so that you can decide via other means if you want to show it or not...any thought?

Currently query logging can be turned on or off via the configuration. This (unfinished) PR is for selective logging - i.e. you may want to have logging for some queries but not for others. What scenario were you thinking wasn't covered?

arichiardi commented 3 years ago

I think the ability to decide the logging level is probably what I am after. However I think I am hijacking this is issue at this point. I will open a new one and describe things better..thank you!