GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.33k stars 313 forks source link

The primary key in GreptimeDB isn't the primary key in other databases #4920

Open nicecui opened 1 week ago

nicecui commented 1 week ago

What type of enhancement is this?

API improvement, User experience

What does the enhancement do?

From Wikipedia, the primary key uniquely specifies a row in a relational table in the database industry. However, in GreptimeDB, the primary key specifies the tag columns in a time-series table, and the combination of these tags does not uniquely specify a row.

This difference in the meaning of the primary key between GreptimeDB and the industry standard leads to additional communication costs. When people see "primary key", they often assume it is the unique row identifier, which can lead to mistakes. GreptimeDB engineers then need to explain that the primary key does not function as users expect.

To address this issue, the creation statements need to be adjusted. For example, adding the time index to the primary key can align the primary key's behavior with users' expectations.

Change the following SQL:

CREATE TABLE grpc_latencies (
  ts TIMESTAMP TIME INDEX,
  host STRING,
  method_name STRING,
  latency DOUBLE,
  PRIMARY KEY (host, method_name)
);

to

CREATE TABLE grpc_latencies (
  ts TIMESTAMP TIME INDEX,
  host STRING,
  method_name STRING,
  latency DOUBLE,
  PRIMARY KEY (host, method_name, ts)
);

Implementation challenges

No response

MichaelScofield commented 1 week ago

Well I suggest deprecate the term "primary key" in GreptimeDB, use "tag" instead. Like this:

CREATE TABLE grpc_latencies (
  ts TIMESTAMP TIME INDEX,
  host STRING,
  method_name STRING,
  latency DOUBLE,

  -- "PRIMARY KEY" is replaced with "TAG":
  TAG (host, method_name)
);