FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.23k stars 213 forks source link

Regression(?): UNION allows to select data of different types (null, date, double precision, string) [CORE5022] #5310

Open firebird-automations opened 8 years ago

firebird-automations commented 8 years ago

Submitted by: @pavel-zotov

Consider following totally stupid query:

select null from rdb$database union all select current_date from rdb$database union all select 3.1415926 from rdb$database union all select 'qwerty' from rdb$database;

Output in 2.5.x and WI-T3.0.0.30566 Firebird 3.0 Alpha 1:

Statement failed, SQLSTATE = HY004 SQL error code = -104 -Datatypes are not comparable in expression UNION

Output in: WI-T3.0.0.30809 Firebird 3.0 Alpha 2 WI-T3.0.0.31374 Firebird 3.0 Beta 1 WI-T3.0.0.31896 Firebird 3.0 Beta 2 WI-V3.0.0.32179 Firebird 3.0 Release Candidate 1:

- is:

<null> 2015-11-18 3.1415926 qwerty

PS. I'm not sure is this behaviour regression or no. Postgres raises error for both following snippets:

select null union all select cast('now' as date) union all select 3.1415926

select null union all select 3.1415926 union all select 'qwerty'

What SQL standard says about it ?

firebird-automations commented 8 years ago
Modified by: @pavel-zotov Version: 3\.0 RC 1 \[ 10584 \] Version: 3\.0 Beta 2 \[ 10586 \] Version: 3\.0 Beta 1 \[ 10332 \] Version: 3\.0 Alpha 2 \[ 10560 \] Component: Engine \[ 10000 \]
firebird-automations commented 8 years ago

Commented by: @dyemanov

Quite interesting case. This is what our code says:

// This function is made to determine a output descriptor from a given list // of expressions according to the latest SQL-standard that was available. // (ISO/ANSI SQL:200n WG3:DRS-013 H2-2002-358 August, 2002). // // The output type is figured out as based on this order: // 1) If any datatype is blob, returns blob; // 2) If any datatype is a) varying or b) any text/cstring and another datatype, returns varying; // 3) If any datatype is text or cstring, returns text; // 4) If any datatype is approximate numeric then each datatype in the list shall be numeric // (otherwise an error is thrown), returns approximate numeric; // 5) If all datatypes are exact numeric, returns exact numeric with the maximum scale and the // maximum precision used. // 6) If any datatype is a date/time/timestamp then each datatype in the list shall be the same // date/time/timestamp (otherwise an error is thrown), returns a date/time/timestamp. // // If a blob is returned, and there is a binary blob in the list, a binary blob is returned. // // If a blob/text is returned, the returned charset is figured out as based on this order: // 1) If there is a OCTETS blob/string, returns OCTETS; // 2) If there is a non-(NONE/ASCII) blob/string, returns it charset; // 3) If there is a ASCII blob/string, a numeric or a date/time/timestamp, returns ASCII; // 4) Otherwise, returns NONE.

So it looks like it should work without errors and VARCHAR should be returned. Prior to v3.0 Alpha/Beta, unions were processed in pairs and DATE was compared to NUMERIC at the second iteration, causing an error. v3.0 flattens the union tree before processing and thus it checks four datatypes at once, and this doesn't lead to an error because CHAR is detected (rule 2 above).

From another side, only MySQL seems to support such a query and it's not the best candidate to look at in terms of the SQL compliance ;-)

Oracle: ORA-01790: expression must have same datatype as corresponding expression PGSQL: ERROR: UNION types date and numeric cannot be matched DB2: SQL0415N The data types of corresponding columns are not compatible in a fullselect that includes a set operator

So my conclusion is that it's not a regression but we need to double check the SQL specification to ensure that our behavior is correct.