apache / horaedb

Apache HoraeDB (incubating) is a high-performance, distributed, cloud native time-series database.
https://horaedb.apache.org
Apache License 2.0
2.65k stars 205 forks source link

support either nanosecond timestamp precision or composite primary keys #1597

Open grenade opened 6 days ago

grenade commented 6 days ago

Describe This Problem

my use case is trade data analytics for signal or strategy indicators. a trade has:

i am struggling to document a table definition in horaedb that does not lose trades. since there are many trades which occur within the same millisecond. nanosecond precision appears to be unavailable.

i have attempted to work around the millisecond precision limitation by using a composite primary key (time, exchange, quote, id). each base gets it's own table which allows for multi-exchange series and combining quotes where their underlying values are equivalent (ie: usdt == usdc). however it appears that horaedb accepts the definition but ignores any value in the composite key that is not the timestamp. this results in all but the first trade in a given millisecond being discarded as a duplicate.

Proposal

i would appreciate ideas about how to utilise nanosecond timestamp precision or how to correctly define a composite key that will actually work.

Additional Context

here's the table definition i have tried:

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

here's an example of source data where base is btc, quote is usd and exchange is coinbase: https://api.exchange.coinbase.com/products/BTC-USD/trades?limit=10&after=1000

jiacai2050 commented 5 days ago

Hi, nanoseconds is not supported in horaedb now.

For your case, I think you could add another column to bypass this limit,

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    time2 uint64 NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, time2, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

Here we add another column time2 to primary keys, and it's defined as time(nanoseconds) % 1e6, so rows with same time value won't be overwritten.