Generic "predicate query" to decide whether to collect a metric?

ringerc commented 5 months ago

I'm looking at adopting pg_exporter as a potential replacement for the CloudNative-PG built-in scraper I currently use. It looks pretty impressive other than the lack of test cover and test facilities.

One of the few things it doesn't do that the CNPG scraper does is permit the use of a generic boolean expression query to decide "should I run this metric query or not" (added by https://github.com/cloudnative-pg/cloudnative-pg/pull/4503). pg_exporter has tags, but the tags must be specified from the outside by whatever provisions the service, and apply to all DBs being scraped.

I'm wondering if you'd be open to submission of a patch to implement that, assuming that my initial PoC with this scraper finds it otherwise a good fit for my needs.

I need to express things like "collect this metric set if the foo extension is installed and is at least version 2.1", "collect this metric only when running on (some specific vendor postgres flavour)" or "only try to collect this if this specific table exists and has this specific column".

This isn't possible in plain postgres SQL dialect due to its lack of bare procedural blocks and the inability of a DO block to return a result set. Hacks with refcursors can be used to get a result from a DO block but only with multiple statements, which this scraper doesn't support (and probably shouldn't as it opens a whole can 'o worms).

I propose an extension to the syntax like:

predicate_queries:
  - predicate_query: |
      SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_schema = 'pg_catalog' AND table_name = 'foo')
query: |
  SELECT a, b, c FROM pg_catalog.some_vendor_extension;

The predicate queries would be run in the order specified. If any returns zero rows or a 1-column boolean false, the metric collection is skipped. If it returns true, collection proceeds to check the next predicate (if any) or to collect the metric. A multi-column result or multi-row result logs an error the first time it runs then disables the metric for the lifetime of the process.

The purpose of multiple queries is to permit things like (a bit contrived):

predicate_queries:
  - predicate_query: |
      SELECT 1 FROM pg_extension WHERE extname = 'foo';
  - predicate_query: |
      SELECT foo_is_enabled FROM foo.foo_is_enabled()

where it's first necessary to know that some database object exists before attempting to access it, since Pg requires that names must resolve at parse/plan time so you can't lazy-bind possible-undefined names in e.g. a CASE.

Predicate query results would support the ttl key and if set, use the cache. If the main metric query reports an error, that would automatically invalidate the predicate query cache. E.g.

predicate_queries:
  - predicate_query: |
      SELECT 1 FROM pg_extension WHERE extname = 'foo';
    ttl: 300

I'm also tempted to add separate positive and negative cache TTLs since it's common to have a metric initially not scrape-able due to an extension not yet being created, but once it's scrape-able once it stays that way. It's probably unnecessary though, and exponential backoff might be better if it proves to be required.

If I get the chance I may be able to add named query indirection so one predicate query set can be used for multiple scrapers to reduce verbosity. I'm not sure if it'll be worth it, will see.

Thoughts? The in-house implementation of this logic turned out to be very simple to do, and should translate well to this scraper.

Vonng commented 5 months ago

I think predicate query is an excellent idea! It could bring a lot of flexibility.

Here are some of my rough thoughts:

Should we allow multiple Predicate Queries? I can understand the point of adding multiple check queries. In that case, can we condense all the Predicate Queries into a single one using SQL's Boolean short-circuit logic? Since readability isn't important here, I think using a single Predicate Query might offer better performance and maintainability.
The changes to the config file should be forward-compatible. We could add a predicate field to the Collector definition. If this field is present, the corresponding predicate query will be executed, and branch logic will be applied. Personally, I feel that maintaining a separate set of reusable Predicate Queries might be redundant because most default Collectors should use the existing mechanism.
TTL is necessary, but we could use a fixed value like 100ms or the same as the main query. The check should not take too long, so a fixed small value could be ok.

Of course, these are just my rough ideas. I’m very open to further discussion, and I would greatly welcome any implementations or contributions. 😊

ringerc commented 5 months ago

I've written a proof of concept PR for this: https://github.com/Vonng/pg_exporter/pull/47

Vonng / pg_exporter

Generic "predicate query" to decide whether to collect a metric? #41