Open rafiss opened 4 months ago
FWIW, I manually came up with a list of all the agg-metrics by looking for usages of the agg metrics library on master
(see below). I believe the generated docs make use of the metric metadata, which AFAIK does not include information on whether it's an agg-metric. We might have to do something like make type assertions against the individual metrics in the code gen to get a hold of this info.
Current list of agg-metrics (created manually by me, it's possible I missed a few):
- changefeed.error_retries
- changefeed.emitted_messages
- changefeed.emitted_batch_sizes
- changefeed.filtered_messages
- changefeed.message_size_hist
- changefeed.emitted_bytes
- changefeed.flushed_bytes
- changefeed.flushes
- changefeed.size_based_flushes
- changefeed.parallel_io_queue_nanos
- changefeed.parallel_io_pending_rows
- changefeed.parallel_io_result_queue_nanos
- changefeed.parallel_io_in_flight_keys
- changefeed.sink_io_inflight
- changefeed.sink_batch_hist_nanos
- changefeed.flush_hist_nanos
- changefeed.commit_latency
- changefeed.admit_latency
- changefeed.backfill_count
- changefeed.backfill_pending_ranges
- changefeed.running
- changefeed.batch_reduction_count
- changefeed.internal_retry_message_count
- changefeed.schema_registry.retry_count
- changefeed.schema_registry.registrations
- changefeed.aggregator_progress
- changefeed.checkpoint_progress
- changefeed.lagging_ranges
- changefeed.cloudstorage_buffered_bytes
- changefeed.kafka_throttling_hist_nanos
- tenant.consumption.request_units
- tenant.consumption.kv_request_units
- tenant.consumption.read_batches
- tenant.consumption.read_requests
- tenant.consumption.read_bytes
- tenant.consumption.write_batches
- tenant.consumption.write_requests
- tenant.consumption.write_bytes
- tenant.consumption.sql_pods_cpu_seconds
- tenant.consumption.pgwire_egress_bytes
- tenant.consumption.external_io_egress_bytes
- tenant.consumption.external_io_ingress_bytes
- tenant.consumption.cross_region_network_ru
- livebytes
- keybytes
- valbytes
- rangekeybytes
- rangevalbytes
- totalbytes
- intentbytes
- lockbytes
- livecount
- keycount
- valcount
- rangekeycount
- rangevalcount
- intentcount
- lockcount
- intentage
- gcbytesage
- sysbytes
- syscount
- abortspanbytes
- kv.tenant_rate_limit.num_tenants
- kv.tenant_rate_limit.current_blocked
- kv.tenant_rate_limit.read_batches_admitted
- kv.tenant_rate_limit.write_batches_admitted
- kv.tenant_rate_limit.read_requests_admitted
- kv.tenant_rate_limit.write_requests_admitted
- kv.tenant_rate_limit.read_bytes_admitted
- kv.tenant_rate_limit.write_bytes_admitted
- security.certificate.expiration.ca
- security.certificate.expiration.client-ca
- security.certificate.expiration.ca-client-tenant
- security.certificate.expiration.ui-ca
- security.certificate.expiration.client
- security.certificate.expiration.client-tenant
- security.certificate.expiration.node
- security.certificate.expiration.node-client
- security.certificate.expiration.ui
- jobs.row_level_ttl.span_total_duration
- jobs.row_level_ttl.select_duration
- jobs.row_level_ttl.delete_duration
- jobs.row_level_ttl.rows_selected
- jobs.row_level_ttl.rows_deleted
- jobs.row_level_ttl.num_active_spans
- jobs.row_level_ttl.total_rows
- jobs.row_level_ttl.total_expired_rows
- rpc.connection.healthy
- rpc.connection.unhealthy
- rpc.connection.inactive
- rpc.connection.healthy_nanos
- rpc.connection.unhealthy_nanos
- rpc.connection.heartbeats
- rpc.connection.failures
- rpc.connection.avg_round_trip_latency
Is your feature request related to a problem? Please describe. The
server.child_metrics.enabled
cluster setting enables exporting child metrics with additional labels in Prometheus. There's no way of seeing which metrics would be affected if the setting is enabled.Describe the solution you'd like Document which metrics are affected. Ideally, this could be something that's documented automatically in
docs/generated/metrics/metrics.html
.Describe alternatives you've considered Look at usages of the AggGauge/AggCounter/AggHistogram/etc libraries in the code to get a sense of which ones are impacted.
Additional context This question came up in: https://cockroachlabs.slack.com/archives/C012GFANG5R/p1715910844186129?thread_ts=1715900517.125659&cid=C012GFANG5R
Jira issue: CRDB-38839