One optimization trick of multiple constraints in WHERE clause

AlaSQL / alasql

AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel.

http://alasql.org

MIT License

7.04k stars 659 forks source link

One optimization trick of multiple constraints in WHERE clause #938

Open hkmoon opened 7 years ago

hkmoon commented 7 years ago

Hi,

Recently, I have noticed that LIKE operation is quite slow with the other constraints. However, = the equal operation looks fast.

If you have a query with LIKE such as below:

SELECT * FROM table WHERE age = 10 AND no = 1 AND name LIKE '%moon%';

The above query takes longer than the below:

SELECT * FROM (SELECT * FROM table WHERE age = 10 AND no = 1) WHERE name LIKE '%moon%';

It would be great to optimize a query while parsing it. For example, the expensive operations should be carried on later after reducing the search space.

Thank you so much for all the hard works!!

mathiasrw commented 7 years ago

This is a great input. I will mark it so others can be inspired to contribute with code to help this out.

Do you have any tests on how much longer?

hkmoon commented 7 years ago

There is a dataset of 10 columns and 38079 rows. I did two trials. The optimized query showed 3141 ms and 3120 ms response time while the un-optimized one had 25509 ms and 28394 ms. It's 8.6 times faster.

mathiasrw commented 7 years ago

did you put indexes on the tables?

hkmoon commented 7 years ago

Yes, I used CREATE INDEX idx_1 on tableA ( A, B, C ); A and B are number type whereas C is string type.

By the way, do I have to drop the index when I drop the table or the index will be automatically removed when the table is dropped?

mathiasrw commented 7 years ago

I actually dont know. I would imagine that the index is removed together with removed content...

Please try out where you make an index for each of A, B and C.

kevinyen-oath commented 5 years ago

Observed the same behavior – chained where clause is slower than nested sub-queries with single condition in each query.

mathiasrw commented 5 years ago

I guess the query planning should take into account that comparing some elements is faster than others - meaning that if we filter things out with the fast ones the slower ones handles less data.