Analyze/improve query performance for Yugabyte

swoehrl-mw commented 2 years ago

With the newest version of YugabyteDB and some optimizations insert performance for YSQL has increased dramatically to about 295000 inserts/s in our testcase (copy mode, batch size 1000), bringing it up to par with the YCQL interface (which also increased slightly to about 290000).

But the query performance is unexpectedly slow. For the usecase with 500 million rows I get the following query times for our test queries:

count-events: 160s (slower than other PostgreSQL-compatible databases but not terribly so)
temperature-min-max: 0.03s (this is good as it shows the index is used as we want it)
temperature-stats: 1530s (compared to other PostgreSQL-compatible databases this is an order of magnitude slower)
temperature-stats-per-device: 1717s (again an order of magnitude slower)
newest-per-device: This fails with psycopg2.errors.ConfigurationLimitExceeded: temporary file size exceeds temp_file_limit (1048576kB)

One more issue: When creating the index events_device_ts sometimes the following error happens: psycopg2.errors.InternalError_: Aborted: ERROR: Query error: schema version mismatch for table 000033e6000030008000000000004104: expected 1, got 0. The index still seems to get created as running the CREATE INDEX statement again yields NOTICE: relation "events_device_ts" already exists, skipping.

For reference the commands:

helm install yugabyte yugabytedb/yugabyte -f dbinstall/yugabyte-values.yaml --version 2.15.0
python run.py insert --target yugabyte_sql -w 16 -r 1 --num-inserts 31250000 --batch 1000 --primary-key sql
python run.py query --target yugabyte_sql -w 1 -r 1 --extra-option create_indices=true

@FranckPachot I hope your offer of help is still valid? I would be grateful for some pointers on how to deal with the temp_file_limit and any ideas on how to speed up the other queries. With the latest optimizations insert performance is formidable, so I would love to get query performance to match.

FranckPachot commented 2 years ago

I'll look at this. About the error when creating the index, I don't expect it to happen if there are no other DDL appending at the same time.

The index is probably created but invalid (visible with select pg_class.relname from pg_index join pg_class on pg_index.indexrelid = pg_class.oid where not indisvalid), so you have to drop it and recreate it.
temp_file_limit can be increased but with the events_device_ts index we should not need it. This is probably because the index was invalid I'll check all queries to see what we can do for 500 million rows.

FranckPachot commented 2 years ago

temperature-stats: this could be pushed down for partial aggregation on each node, but avg() is not. I've opened #13336. The workaround is replacing avg() sith sum() and count():
```
SELECT max(temperature), sum(temperature)/count(temperature)::numeric avg, min(temperature) FROM events;
```

swoehrl-mw commented 2 years ago

Hi @FranckPachot. Thank you for your suggestions. I did another test and made sure the index was created correctly.

temperature-stats: With your changed query the time was reduced significantly to ~220s
temperature-stats-per-device: I also changed the query to use sum/count instead of avg but it did not make the query faster.
newest-per-device: No change even with valid index, still failing with temp_file_limit. I did confirm the query uses the index though:

postgres=# explain SELECT device_id, temperature from (SELECT device_id, temperature, timestamp=max(timestamp) over (partition by device_id) newest FROM events) e where newest;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Subquery Scan on e  (cost=0.00..141.50 rows=500 width=36)
   Filter: e.newest
   ->  WindowAgg  (cost=0.00..131.50 rows=1000 width=37)
         ->  Index Only Scan using events_device_ts on events  (cost=0.00..114.00 rows=1000 width=44)
(4 rows)

FranckPachot commented 2 years ago

Yes, I thought that the PostgreSQL WindowAgg was smarter as there's no need to materialize all rows into temp to get the max() over a sorted set. I got it work with set temp_file_limit='50GB'. However, here is something smarter. This should get the same result, with limited reads (it gets to the last device/last timestamp, then gets to the next device/last timestamp... with fast IndexOnlyScan returning one row only for each device):

with recursive device_last_event as (
 -- start at the last device_id / last timestamp
 (
  select
   last_device_last_event.device_id,
   last_device_last_event.temperature
   from events last_device_last_event
  order by
   last_device_last_event.device_id desc,
   last_device_last_event.timestamp desc
  limit 1
 )
union all
 -- from this  last device_id / last timestamp get the next one
 select
  next_device_last_event.device_id,
  next_device_last_event.temperature
 from device_last_event
 , lateral
 (
  select device_id,temperature from events
  where device_id < device_last_event.device_id
  order by device_id desc, timestamp desc limit 1
 )
 as next_device_last_event
) 
select * from device_last_event
order by device_id,temperature;

This uses a well know postgres workaround for loose index scan and works in YugabyteDB 2.15.1 (was not in 2.15.0)

swoehrl-mw commented 2 years ago

Hi @FranckPachot. I've finally found the time to test your workaround and can confirm the query is very fast (0.08s, basically what I would expect of an index-only query). I've updated the repo README with updated results for Yugabyte. But to be fair I have not included that last result in the comparison table but made a notation of it in the explanations. Thanks again for your help.

FranckPachot commented 1 year ago

Hi @swoehrl-mw the latest release of YugabyteDB (2.17.1.0-b439) has implemented loose index scan for DISTINCT, which means that the newest-per-device can be blazing fast if written as:

select * from
(
   select distinct device_id from events where device_id>'' -- need a where clause to have IndexScan
) devices
, lateral (
   select temperature from events
   where device_id=devices.device_id
   order by timestamp desc limit 1
  ) as events
;

I don't think this optimization helps on PostgreSQL. Maybe on Timescale DB.

swoehrl-mw commented 1 year ago

Hi @FranckPachot. Thanks for the update. I will include that in the next benchmark round.

MaibornWolff / database-performance-comparison

Analyze/improve query performance for Yugabyte #7