Add telemetry - Githubissues

polRk commented 3 months ago

Use case

Collect traces, metrics, logs

Describe the solution you'd like

Use OpenTelemetry for collecting metrics, logs, traces

Describe the alternatives you've considered

Expose metrics as an object via client method (example: chClient.metrics() # [metric{}, metric{}]

Additional context

slvrtrn commented 3 months ago

Can you please provide a more detailed description of your use case? Do you maybe have a good example of another language client library doing that?

polRk commented 3 months ago

Can you please provide a more detailed description of your use case?

Collect queries count, quries duration. monitoring performance clickhouse library itself, expose tracings

https://opentelemetry.io.

Do you maybe have a good example of another language client library doing that?

http://effect.website

maxdeichmann commented 2 months ago

@slvrtrn id also be highly interested. Prisma provides a tracing feature including metrics and spans. I think an otel integration would be perfect here. Eventually, i want to be able to track which clickhouse queries are slow.

slvrtrn commented 2 months ago

Can this be implemented as a standalone add-on (maybe as a package in this repo, like packages/client-node-otel), or do you need more detailed metrics from within the client internals?

The reason I am asking is that it's a bit concerning to add a ~1.9M dependency (assuming we need this) to the main client, which is ~500KB on its own, but not everyone needs this functionality.

maxdeichmann commented 2 months ago

I am asking is that it's a bit concerning to add a ~1.9M dependency (assuming we need this) to the main client, which is ~500KB on its own, but not everyone needs this functio

I think having this as its own dependency is totally fine.

polRk commented 2 months ago

The reason I am asking is that it's a bit concerning to add a ~1.9M dependency

These are constants for naming by convention - see @opentelemetry/semantic-conventions,

polRk commented 2 months ago

Can this be implemented as a standalone add-on (maybe as a package in this repo, like packages/client-node-otel), or do you need more detailed metrics from within the client internals?

Tracing should have a context, this is exactly what needs to be implemented at the library function level.

To get metrics, it will be enough if I can read metrics (in opentelemetry format) from the clickhouse client using my opentelemetry/metrics-sdk The client ch <- (reading metrics in a opentelemetry format) <- sdk opentelemetry <- ..processing

The Clickhouse client must be a metricProducer, https://opentelemetry.io/docs/specs/otel/metrics/sdk/#metricproducer

slvrtrn commented 2 months ago

This makes sense; however, I am currently busy with other projects. Would you like to create a PR for this feature?

polRk commented 2 months ago

Would you like to create a PR for this feature?

Yes, of course, but I need a technical assignment and some introductory information about the clickhouse client library. Where to start (i think about https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/src/config.ts#L217)

slvrtrn commented 2 months ago

@polRk, DM me in the community Slack; I could do a short intro.

constb commented 1 month ago

It would be best if metrics were exposed in a manner that is not dependent on opentelemetry and its conventions. I currently collect metrics for prometheus by wrapping methods of clickhouse client. Nothing special, just counters for the number of queries and the number of errors and a histogram of query execution times.

Honestly, it would suffice if clickhouse client was an event emitter sending start/finish/error events…

slvrtrn commented 1 month ago

Honestly, it would suffice if clickhouse client was an event emitter sending start/finish/error events…

I like that this does not require extra dependencies. Then, looks like it should be possible to add necessary wrappers on the application side (we can also provide copy-pasteable examples for OTEL/Prometheus)

@constb, can you please provide more details on how exactly you collect metrics in your use case? That could be useful to draft a possible implementation. If it's not shareable in public, you could also DM me in the community Slack.

constb commented 1 month ago

@slvrtrn honestly, it's very straightforward. Code is similar to this:

const client = createClient(config);
client.exec = wrapWithMetrics(client, client.exec, metrics, clientId);
client.command = wrapWithMetrics(client, client.command, metrics, clientId);
client.query = wrapWithMetrics(client, client.query, metrics, clientId);
client.insert = wrapWithMetrics(client, client.insert, metrics, clientId);

function wrapWithMetrics(
  client,
  originalMethod,
  { queriesCounter, performanceHistogram, errorsCounter },
  clientId,
) {
  return (...args: unknown[]) => {
    const end = performanceHistogram.startTimer({ clientId });
    try {
      queriesCounter.inc({ clientId });
      return await originalMethod.apply(client, args);
    } catch (err: unknown) {
      errorsCounter.inc({ clientId });
      throw err;
    } finally {
      end();
    }
  }
}

I typed this code in the comment field, hopefully I didn't miss anything :)

maxdeichmann commented 2 weeks ago

@slvrtrn is there a way to also get db insights into the telemetry? I would be interested in rows read from disk, amount of threads used, cpu and the like.

slvrtrn commented 2 weeks ago

@maxdeichmann, you could use the summary header to a certain extent. For example:

import { createClient } from '@clickhouse/client'

void (async () => {
  const client = createClient()
  const rs = await client.query({
    query: 'SELECT number FROM system.numbers LIMIT 5',
    format: 'JSONEachRow'
  })
  const rows = await rs.json<{ number: string }>()
  console.log('Rows:', rows)
  console.log('Summary:', rs.response_headers['x-clickhouse-summary'])
  await client.close()
})()

this prints:

Rows: [
  { number: '0' },
  { number: '1' },
  { number: '2' },
  { number: '3' },
  { number: '4' }
]
Summary: {"read_rows":"5","read_bytes":"40","written_rows":"0","written_bytes":"0","total_rows_to_read":"5","result_rows":"0","result_bytes":"0","elapsed_ns":"2100708"}

However, please note that the summary might be incorrect without the wait_end_of_query=1 setting (see https://github.com/ClickHouse/ClickHouse/issues/12122), which should be used with caution (see e.g., https://github.com/ClickHouse/ClickHouse/issues/46426#issuecomment-1431358301)

Regarding the advanced stats such as memory and threads usage, we don't get it in the response in any way; you will need to check system.query_log for that; it wouldn't be realistically possible to build this into the client, as the logs are flushed periodically, and not available right away.

ClickHouse / clickhouse-js

Add telemetry #289

Use case

Describe the solution you'd like

Describe the alternatives you've considered

Additional context