ClickHouse / clickhouse-js

Official JS client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
225 stars 27 forks source link

Query metrics or query id #130

Closed ultram4rine closed 1 year ago

ultram4rine commented 1 year ago

Hi everyone! I'm trying to implement showing query metrics for ultram4rine/sqltools-clickhouse-driver#281 and since I'm rewriting my driver using this official lib I have a question, is it possible to get query_id to use it in statement like:

SELECT query_duration_ms,
  read_rows,
  read_bytes
FROM system.query_log
WHERE query_id = 'abfba2b6-9488-4c45-9186-d86b4c0d8bba'
  AND type = 2;

If no, is it possible to implement getting query id or metrics in this lib?

slvrtrn commented 1 year ago

if you are using JSON format for queries (non-streamable), it should be possible to obtain the metadata like in this example:

ts-node --transpile-only --project tsconfig.dev.json examples/select_json_with_metadata.ts 

{
  meta: [ { name: 'number', type: 'UInt64' } ],
  data: [ { number: '0' }, { number: '1' } ],
  rows: 2,
  rows_before_limit_at_least: 2,
  statistics: { elapsed: 0.001263917, rows_read: 2, bytes_read: 16 }
}

As for the query_id, I will look into what's possible

slvrtrn commented 1 year ago

@ultram4rine I would like to know if something like this will help.

query_id will be obtainable from query, insert and exec

https://github.com/ClickHouse/clickhouse-js/blob/ada4dae87f4a7335a4509dc13c5a38eb11e2886a/__tests__/integration/query_log.test.ts#L13-L60

It is not yet merged, see https://github.com/ClickHouse/clickhouse-js/pull/132, but we can add it in 0.0.12

ultram4rine commented 1 year ago

@slvrtrn while query_id is pretty useful, using system.query_log is a bit awkward. On the other hand, with JSON format, I get statistics just like in console or in web interface + meta columns, but I can't use stream. I wonder, if it is possible to add this statistics and meta to stream, like it was in @alpa/node-clickhouse lib?

slvrtrn commented 1 year ago

@ultram4rine like it is done here? https://github.com/apla/node-clickhouse#selecting-large-dataset

stream.on('end', () => {
  console.log(
    rows.length,
    stream.supplemental.rows,
    stream.supplemental.rows_before_limit_at_least, // how many rows in result are set without windowing
  )
})
ultram4rine commented 1 year ago

@slvrtrn yes, but maybe stats like

stream.on('end', () => {
  console.log(
    stream.statistics.elapsed_time,
    stream.statistics.read_rows,
    stream.statistics.read_bytes,
  )
})
slvrtrn commented 1 year ago

@ultram4rine I'll check if it's possible with HTTP protocol and streams

slvrtrn commented 1 year ago

@ultram4rine our best bet here is to decode the X-ClickHouse-Progress header (see the entry about it in https://clickhouse.com/docs/en/interfaces/http#default-database) plus add some internal stopwatch to mimic the elapsed_time behavior as it is not supported there and emit this info in the stream; however, I am not sure how useful it will be.

mshustov commented 1 year ago

r best bet here is to decode the X-ClickHouse-Progress header (see the entry about it in https://clickhouse.com/docs/en/interfaces/http#default-database)

If we want to use X-ClickHouse-Progress header, we will have to combine it with wait_end_of_query=1, which defeats the purpose of response streaming. see https://clickhouse.com/docs/en/interfaces/http/#response-buffering

stream.statistics.elapsed_time, stream.statistics.read_rows, stream.statistics.read_bytes,

I'd suggest waiting until the library implements the native protocol to receive that information from ClickHouse server. I'd prefer not to calculate these values manually. For example, there might be a discrepancy in the read_bytes since values have different representations in C++ and nodejs runtime.

s add some internal stopwatch to mimic the elapsed_time

The value we calculate on the client side will include network overhead. The proposed logic can be implemented in the user app:

console.time('label')
await client.query(...)
console.timeEnd('label')
slvrtrn commented 1 year ago

@ultram4rine query_id is now obtainable as of 0.0.12.