DiceDB / dice

DiceDB is an in-memory real-time database with SQL-based reactivity. It is hyper-optimized for building and scaling truly real-time applications on modern hardware while being a drop-in replacement for Redis.
https://dicedb.io/
Other
4.86k stars 636 forks source link

Pipeline Query Language (querying metrics) Feature Request #712

Open rossb83 opened 3 days ago

rossb83 commented 3 days ago

Not sure if this is the right place to request a feature, but I am interested to find an open source database to add capabilities in querying metrics with an M3QL/promQL-like language. Would dicedb benefit from this or would that be too out of scope? Under the hood it can be a simple M3QL -> SQL language translation (CTEs would need to be supported). Schema requirements would be up for debate, but there would need to be a timestamp column, value column and a column to support an internal "json" to support metrics tags. Data ingestion would need to be streamed and be immutable. Similar to the leaderboard demo, perhaps a QWATCH can be placed on a metric query to generate a grafana-like timeseries display.

M3QL gives simple ways to express metrics transforms like summarizing bucketed summation over time-windows and finding the percentage of a metric against a time-shifted version of itself. I would love to see it supported outside of Uber in the open source world. All of this is possible in SQL as well, but the syntax becomes too cumbersome for a human to write.

I'm happy to give further ideas/support if there is interest here.

arpitbbhayani commented 3 days ago

Hello @rossb83,

M3QL fits nicely with DiceDB and is aligned with our vision as well. We are unsure about schema imposition (declaration and adherence) as we are a raw and crude KV store today. Deducing schema and its alterations might be the way to go.

But the next phase of QWATCH and DSQL will be to support complex functions and going with M3QL is a good idea. Happy to discuss this further. Feel free to block my calendar and discuss this further.

Once again, thanks for suggesting and nudging.

ps: although the event says "Evaluate DiceDB", we will keep it focussed on M3QL support.

rossb83 commented 3 days ago

Actually a KV store may make things slightly simpler. I assumed sql db because of the qwatch example but being a redis replacement I should have done more diligence. In this case you don't need a schema; M3DB is a kv store as well. One way to do it is convert ingested metrics into a {m3id, epochMillis} pair as the key and the measurement/count as the value (where the m3id is a sorted and comma-delimited string for all the tags of the emitted metric). Now querying becomes simple if you know the m3id, the bucketed timerange and have enough compute to perform all the summation operations. The tricky part is that queries will only contain a partial list of tags e.g. matching

fetch name:ingest-event error:true zone:dca* | sum shadow-mode | summarize 5m avg

to match the tags up to all M3IDs

* "env=staging,error=true,name=ingest-event,shadow-mode=off,zone=dca2"
* "env=production,error=true,name=ingest-event,shadow-mode=on,zone=dca1"
* ...

Uber uses ES for this indexing layer (which may not be the best solution). OK will block time for later in the week thanks, sounds exciting.