Performance is close to 3X worse when large number of rows match in-clause vs few rows. Non-key column is used in filter so full scan will be done in either cases.
select count(*) from T where column1 in ([1000 values]) //only 1000 values of non-key column1 matches in-clause values
Output:
1000
Time: 8.722 sec(s)
select count(*) from T where column2 in ([1000 values]) //all values of non-key column2 matches values in in-clause values
Output:
5000000
Time: 23.654 sec(s)
Performance is close to 3X worse when large number of rows match in-clause vs few rows. Non-key column is used in filter so full scan will be done in either cases.
To repro, expand http://phoenix-bin.github.com/client/test/in_test.zip in phoenix/bin directory and run test.sh.
Output of test:
select count(*) from T where column1 in ([1000 values]) //only 1000 values of non-key column1 matches in-clause values Output: 1000 Time: 8.722 sec(s)
select count(*) from T where column2 in ([1000 values]) //all values of non-key column2 matches values in in-clause values Output: 5000000 Time: 23.654 sec(s)