apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.78k stars 3.29k forks source link

Compare string type and numerical type will cause full table scan #3491

Open WingsGo opened 4 years ago

WingsGo commented 4 years ago

Describe the bug The sql like SELECT * FROM tbl WHERE olap_date='20200501' will not only scan partiiton p20200501 but scan the full table, it is because olap_date is int type and '20200501' is string literal, in following logic, olap_date and string literal will convert to double and compare, which lead to a full table scan.

https://github.com/apache/incubator-doris/blob/d64704599d338de290972c1324903cf03d7bab5c/fe/src/main/java/org/apache/doris/analysis/BinaryPredicate.java#L280-L310

Solutions: I think we should throw a DdlException when compare string literal to a numerical type to avoid this situation, Is there any solutions better?

kangkaisen commented 4 years ago

@WingsGo Hi, I think throw a DdlException is not acceptable. Most of query engines support int compare string. But maybe don't choose double as the common type. You could investigate Presto, Hive Or Spark. Thanks.

imay commented 4 years ago

First of all, throwing an exception is not acceptable. Because it will make many already exist queries fail. I think it is viable approach to handle this case in a specified case.

WingsGo commented 4 years ago

@imay @kangkaisen can we check the binary compare type, if one type is string and the other type is numerical such as int, bigint...., we just return the numercal type in getCmpType() function?

WingsGo commented 4 years ago

There is another case cause by the problem.for example: if we convert string and bigint to double, it may cause query result not match , because

Double d1 = new Double(2882303761517473127L);
Double d2 = new Double("2882303761517473267");
System.out.println(d1.equals(d2));    // true
MySQL [(none)]> select 2882303761517473127 = "2882303761517473127";
+-----------------------------------------------+
| 2.8823037615174733E18 = 2.8823037615174733E18 |
+-----------------------------------------------+
|                                             1 |
+-----------------------------------------------+
1 row in set (0.07 sec)

MySQL [(none)]> select 2882303761517473127 = "2882303761517473267";
+-----------------------------------------------+
| 2.8823037615174733E18 = 2.8823037615174733E18 |
+-----------------------------------------------+
|                                             1 |
+-----------------------------------------------+
1 row in set (0.07 sec)
imay commented 4 years ago

@imay @kangkaisen can we check the binary compare type, if one type is string and the other type is numerical such as int, bigint...., we just return the numercal type in getCmpType() function?

It is OK to optimize the case when one side of binary predicate is SlotRef and the other is a StringLiteral. After that work, the problem you mentioned will be resolved.