Open fredrikIOT opened 10 months ago
Hi @fredrikIOT, appreciate your detailed feedback!
I noticed you also run SELECT * FROM "location1" limit 5 offset 1
and it's slower than SELECT count(*) FROM "location1";
I'm wondering if it is reproducible? And how large is the result set? According to the analyze result it has output_rows=100
100 rows. Would the SQL SELECT * FROM "location1" limit 5 offset 1
return 100 rows (without the ANALYZE
) or 5 rows
@fredrikIOT Can you share the SQL of creating table "location1"?
Hi @fredrikIOT, appreciate your detailed feedback!
I noticed you also run
SELECT * FROM "location1" LIMIT 5 offset 1
and it's slower thanSELECT count(*) FROM "location1";
I'm wondering if it is reproducible? And how large is the result set? According to the analyze result it hasoutput_rows=100
100 rows. Would the SQLSELECT * FROM "location1" limit 5 offset 1
return 100 rows (without theANALYZE
) or 5 rows
@waynexia I need to correct an error in my previous message. I referenced the wrong query and result. Here's the accurate information:
Correct Query:
EXPLAIN ANALYZE SELECT * FROM "location1" ORDER BY ts DESC LIMIT 5 offset 1
Correct Result:
MergeScanExec: peers=[4629974745088(1078, 0), ], metrics=[output_rows=5, ready_time=379.715193ms, finish_time=9.498069549s, first_consume_time=9.498045694s]
While reviewing this, I've noticed the significant impact of the ORDER BY
clause on the query performance. Here are some observations:
SELECT * FROM "location1" ORDER BY ts DESC LIMIT 5 offset 1
uses 5372 msSELECT * FROM "location1" LIMIT 5 offset 1
uses 213 msSELECT COUNT(*) FROM "location1";
uses 2787 msSo, I guess the question of whether COUNT
takes more time than LIMIT
is no longer valid. However, it is reproducible that COUNT
is slower when using ORDER BY
.
@fredrikIOT Can you share the SQL of creating table "location1"?
@MichaelScofield Here's the SQL for creating the "location1" table:
CREATE TABLE IF NOT EXISTS "location1" (
"location_uid" STRING NULL,
"room_uid" STRING NULL,
"log_uid" STRING NULL,
"sensor_name" STRING NULL,
"value" DOUBLE NULL,
"ts" TIMESTAMP(3) NOT NULL,
TIME INDEX ("ts"),
PRIMARY KEY ("location_uid", "room_uid", "log_uid", "sensor_name")
) ENGINE=mito WITH(regions = 1)
As I mentioned in my previous message, the performance issue seems to be primarily due to the ORDER BY
clause. However, having the table structure might provide further insights into any other potential performance implications.
So, I guess the question of whether COUNT takes more time than LIMIT is no longer valid. However, it is reproducible that COUNT is slower when using ORDER BY.
Yes, that's correct. The main reason that SELECT count(*) FROM "location1";
runs faster than SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100;
is the ORDER BY ts
. Unfortunately we haven't optimized for this scenario -- though we know it's important for time series scenarios. We used to have one optimization that could speed up ORDER BY ts
by a fascinating percentage, but it's not adapted yet after a major refactor :face_with_peeking_eye:
We can keep this issue open as a tracker.
Do we have any plan to optimize for this case? @waynexia @evenyag
I'm afraid that I don't have enough time to finish it in v0.8, maybe I can optimize some cases in v0.9.
I'm afraid that I don't have enough time to finish it in v0.8, maybe I can optimize some cases in v0.9.
We implemented some optimize rules and execution plans for this case.
v0.10.0 will include these changes. Now order by time index
will sort each time window independently instead of sorting all rows in a table.
What type of bug is this?
Performance issue
What subsystems are affected?
Standalone mode
Minimal reproduce step
SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100;
.What did you expect to see?
Expected efficient use of
LIMIT
andOFFSET
, where the query execution should be quick, only processing the first 100 rows.What did you see instead?
The query execution is slow, taking multiple seconds (8634 ms) despite using
LIMIT
. It seems that theLIMIT
clause does not stop the query processing after reaching the limit but only restricts the number of rows returned.Performance Metrics:
SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100;
. Metrics:SELECT count(*) FROM "location1";
:SQL Analyze:
What operating system did you use?
Docker (Ubuntu)
What version of GreptimeDB did you use?
docker image: greptime/greptimedb:nightly-20240126-8bade8f8e4
Relevant log output and stack trace
No response