datalust / seq-tickets

Issues, design discussions and feature roadmap for the Seq log server
https://datalust.co/seq
97 stars 5 forks source link

Indexed searches over high-cardinality fields #2077

Closed nblumhardt closed 6 months ago

nblumhardt commented 9 months ago

Seq 2024.1 includes disk-backed indexes for trace identifiers. We've seen this speed up retrieval of all spans (events) in a trace in large datasets significantly: in cases where the whole data set would otherwise need to be scanned, minutes-long searches can be reduced to subsecond times.

Needless to say, this is completely transformative of the Seq user experience in those cases, and we'd love to put this in the hands of all Seq users to apply as they see fit - to customer ids, tenant ids, transaction ids, email addresses, IP addresses, hostnames, and other high-cardinality identifiers that feature prominently in day-to-day searches.

In Seq 2024.2 our aim is to open up high-cardinality indexing through a new index configuration screen and API, and to make indexing cost visible through index statistics or similar quality metrics.

This continues on from #763, which improved the same scenario for ad-hoc searches using pre-filtering.

nblumhardt commented 8 months ago

We're fast-tracking part of this feature in 2024.2 via #2112. The scope and goals of this ticket are unchanged and should land in a similar timeframe to what was originally planned, but the release milestone will now be 2024.3.