VictoriaMetrics / VictoriaMetrics

VictoriaMetrics: fast, cost-effective monitoring solution and time series database
https://victoriametrics.com/
Apache License 2.0
12.11k stars 1.2k forks source link

victorialogs: query performance for non-existent substring #7233

Open peonqi opened 2 days ago

peonqi commented 2 days ago

Is your question request related to a specific component?

victorialogs

Describe the question in detail

I query 80GB of logs (20 million entries) within one hour, Searching for an not exist substring。 using the following query conditions, and it return 0 record:

 _stream:{stream="normal.target"}  log.namespace:="Production" _msg:~"cc641faf212b6xddgdews26d78" _time:[2024-10-08T13:34:42Z,2024-10-08T13:49:42Z) | sort by (_time desc ) | limit 30

it takes 20 seconds. For the same query, ClickHouse only takes 2 seconds. Both Victorialogs and ClickHouse are configured with 64-core CPUs and 256GB of memory.

Troubleshooting docs

Haleygo commented 2 days ago

_stream:{stream="normal.target"} log.namespace:="Production" _msg:~"cc641faf212b6xddgdews26d78" _time:[2024-10-08T13:34:42Z,2024-10-08T13:49:42Z) | sort by (_time desc ) | limit 30

There are two expensive subqueries in this particular expression, substring filter _msg:~"cc641faf212b6xddgdews26d78" and sort pipe sort by (_time desc ), both of them slow down the query. It would be helpful to understand the time spent on each subquery or pipe like query-tracing in victoriametrics. cc @valyala

peonqi commented 2 days ago

_stream:{stream="normal.target"} log.namespace:="Production" _msg:~"cc641faf212b6xddgdews26d78" _time:[2024-10-08T13:34:42Z,2024-10-08T13:49:42Z) | sort by (_time desc ) | limit 30

There are two expensive subqueries in this particular expression, substring filter _msg:~"cc641faf212b6xddgdews26d78" and sort pipe sort by (_time desc ), both of them slow down the query. It would be helpful to understand the time spent on each subquery or pipe like query-tracing in victoriametrics. cc @valyala

yes,The above query is indeed very resource-intensive, but with the same amount of data and the same query logic, ClickHouse takes much less time than Victorialogs. I'm wondering, is there still a lot of room for optimization for Victorialogs with this kind of query?