GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.36k stars 315 forks source link

the result of COUNT(DISTINCT <expression>) is incorrect in range query #5014

Closed 0neSe7en closed 1 week ago

0neSe7en commented 1 week ago

What type of bug is this?

Incorrect result

What subsystems are affected?

Query Engine

Minimal reproduce step

use the data from "quick-start" document.

CREATE TABLE grpc_latencies (
  ts TIMESTAMP TIME INDEX,
  host STRING,
  method_name STRING,
  latency DOUBLE,
  PRIMARY KEY (host, method_name)
) with('append_mode'='true');
INSERT INTO grpc_latencies (ts, host, method_name, latency) VALUES
  ('2024-07-11 20:00:06', 'host1', 'GetUser', 103.0),
  ('2024-07-11 20:00:06', 'host2', 'GetUser', 113.0),
  ('2024-07-11 20:00:07', 'host1', 'GetUser', 103.5),
  ('2024-07-11 20:00:07', 'host2', 'GetUser', 107.0),
  ('2024-07-11 20:00:08', 'host1', 'GetUser', 104.0),
  ('2024-07-11 20:00:08', 'host2', 'GetUser', 96.0),
  ('2024-07-11 20:00:09', 'host1', 'GetUser', 104.5),
  ('2024-07-11 20:00:09', 'host2', 'GetUser', 114.0);

execute the following SQL

SELECT ts, count(distinct host) RANGE '5s' as h FROM grpc_latencies ALIGN '5s' by (method_name);

What did you expect to see?

ts count
1720688405000 2

What did you see instead?

ts count
1720688405000 8

which is the same as the result of

SELECT ts, count(*) RANGE '5s' as h FROM grpc_latencies ALIGN '5s' by (method_name);

the following SQL is correct

select date_bin(INTERVAL '5s', ts) as t, count(distinct host) as count from grpc_latencies group by t

What operating system did you use?

doesn't matter

What version of GreptimeDB did you use?

0.9.5

Relevant log output and stack trace

No response