forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
559 stars 227 forks source link

Performance degradation using in clause #704

Open mujtabachohan opened 10 years ago

mujtabachohan commented 10 years ago

Performance is close to 3X worse when large number of rows match in-clause vs few rows. Non-key column is used in filter so full scan will be done in either cases.

To repro, expand http://phoenix-bin.github.com/client/test/in_test.zip in phoenix/bin directory and run test.sh.

Output of test:

select count(*) from T where column1 in ([1000 values]) //only 1000 values of non-key column1 matches in-clause values Output: 1000 Time: 8.722 sec(s)

select count(*) from T where column2 in ([1000 values]) //all values of non-key column2 matches values in in-clause values Output: 5000000 Time: 23.654 sec(s)