LucidDB / luciddb

DEFUNCT: See README
https://github.com/LucidDB/luciddb
Apache License 2.0
53 stars 24 forks source link

[FRG-292] incorrect type derivation for highly fractional numeric literal #580

Open dynamobi-build opened 12 years ago

dynamobi-build commented 12 years ago

[reporter="jvs", created="Wed, 5 Sep 2007 12:06:32 -0500 (GMT-05:00)"] The type is derived as DECIMAL(20,19), which is an illegal datatype, since our max precision is currently 19. Either it should be producing NUMERIC(19,19) after recognizing that the leading zero is insignificant, or it should be an error.

0: jdbc:luciddb:> explain plan with type for values 0.1234567890123456789;
+----------------------------------+
| column0 |
+----------------------------------+
| EXPR$0 DECIMAL(20, 19) NOT NULL |
+----------------------------------+

dynamobi-build commented 12 years ago

[author="jvs", created="Wed, 5 Sep 2007 12:09:51 -0500 (GMT-05:00)"] This one is debatable:

0: jdbc:luciddb:> values 0.12345678901234567890;
Error: From line 1, column 8 to line 1, column 29: Numeric literal '0.12345678901234567890' out of range (state=,code=0)

Should it discard the trailing zero before deciding?

dynamobi-build commented 12 years ago

[author="jvs", created="Wed, 5 Sep 2007 14:03:22 -0500 (GMT-05:00)"] Here's one with floating point; shouldn't this be rejected as out of range rather than rounded?

0: jdbc:luciddb:> values 0.123456789012345678901E0;
+----------------------+
|EXPR$0|
+----------------------+
| 0.12345678901234568 |
+----------------------+

dynamobi-build commented 12 years ago

[author="rchen", created="Mon, 10 Sep 2007 11:13:42 -0500 (GMT-05:00)"] one more on this:

values (cast (3.1234567890123456 as double));
'EXPR$0'
'3.1234567890123457'

values (cast (3.1234567890123457 as double));
'EXPR$0'
'3.1234567890123457'

dynamobi-build commented 12 years ago

[author="jvs", created="Mon, 10 Sep 2007 14:56:20 -0500 (GMT-05:00)"] I'm guessing that Rushan's last example is on the borderline due to the difference between what can be represented exactly in decimal vs binary (since the floating point mantissa is binary).

dynamobi-build commented 12 years ago

[author="rchen", created="Mon, 10 Sep 2007 16:03:53 -0500 (GMT-05:00)"] There seems to be problem when trying to compare the last digits of two double values. Perhaps farrago is missing the "approximate" comparison ability.

create table t (c double);

insert into t values(3.1234567890123454);
1 row affected (0.265 seconds)

select * from t;
'C'
'3.1234567890123457'
1 row selected (0.205 seconds)

select * from t where c = 3.1234567890123454;
'C'
No rows selected (0.271 seconds)

select * from t where c = 3.1234567890123454E0;
'C'
No rows selected (0.294 seconds)

insert into t values(3.1234567890123459);
1 row affected (0.388 seconds)

select * from t;
'C'
'3.1234567890123457'
'3.123456789012346'
2 rows selected (0.041 seconds)

select * from t where c = 3.1234567890123459;
'C'
'3.1234567890123457'
1 row selected (0.207 seconds)

select * from t where c = 3.1234567890123459E0;
'C'
'3.1234567890123457'
1 row selected (0.287 seconds)

Change both insert to use the double notation x.yyyyyy...E0, then the queries will return good comparison result.

Shouldn't insert values and comparison use type conversion rules that produce consistent results when it comes to decimal to double conversion?

dynamobi-build commented 12 years ago

[author="rchen", created="Mon, 10 Sep 2007 16:05:33 -0500 (GMT-05:00)"] Forgot the paste the table content after inserting two double literals:

insert into t values(3.1234567890123454E0);
select * from t;;
'C'
'3.1234567890123452'
1 row selected (0.188 seconds)

insert into t values(3.1234567890123459E0);
select * from t;
'C'
'3.1234567890123452'
'3.1234567890123457'
2 rows selected (0.262 seconds)

dynamobi-build commented 12 years ago

[author="rchen", created="Mon, 10 Sep 2007 20:06:13 -0500 (GMT-05:00)"] Another example of incorrect comparison result for double types:

truncate table t;
insert into t values(4600.0000E0);

select "C", cast(cast("C" as double)_cast(1000.0 as double)/cast(3.1234567890123456 as double) as double) from t where cast(cast("C" as double)_cast(1000.0 as double)/cast(3.1234567890123456 as double) as double) = 1472727.2732511677;
'C','EXPR$1'
No rows selected (0.239 seconds)

select "C", cast(cast("C" as double)_cast(1000.0 as double)/cast(3.1234567890123456 as double) as double) from t where cast(cast("C" as double)_cast(1000.0 as double)/cast(3.1234567890123456 as double) as double) = 1472727.2732511678;
'C','EXPR$1'
'4600.0','1472727.2732511677'
1 row selected (0.298 seconds)

dynamobi-build commented 12 years ago

[author="rchen", created="Tue, 11 Sep 2007 16:52:48 -0500 (GMT-05:00)"] This last example can be simplified as:

select cast(1472727.2732511678 as double) from t;
'EXPR$0'
'1472727.2732511677'
1 row selected (1.231 seconds)

The problem seems to be the code generated for
cast(decimal(17,10) as double)

The jave code fragment currently looks like this:

            private final void calc_oj_var3_f_0()
            {
                if (oj_var5 == null) {
                    oj_var5 = new net.sf.farrago.dynamic.stmt7.ExecutableStmt.Oj_inner_0();
                    oj_var5.assignFrom( 14727272732511677l );
                }
                if (Double.isInfinite( (double) ((double) oj_var5.value / 1.0E10d) )) {
                    throw net.sf.farrago.resource.FarragoResource.instance().Overflow.ex();
                }
                oj_var3.ID$0$EXPR$0 = (double) ((double) oj_var5.value / 1.0E10d);
            }

The division loses precision even if the original decimal can be precisely represented by double. In this example, the unscaled decimal 14727272732511677 first gets converted to double before the division to 14727272732511676 because 14727272732511677 cannot be accurately represented by the double type. After the division, the result 1472727.2732511676 again cannot be accurately represented so it is converted to 1472727.2732511675.

One alternative is to keep the representation "precise" as much as possible by using the BigDecimal class in java, and use the respective method provided by BigDecimal class to produce equivalent representation for the target type of the cast operator.

dynamobi-build commented 12 years ago

[author="rchen", created="Tue, 11 Sep 2007 16:54:29 -0500 (GMT-05:00)"] Correction:
the explanation above refers to this example:

select cast(1472727.2732511677 as double) from t;
'EXPR$0'
'1472727.2732511675'
1 row selected (1.231 seconds)

dynamobi-build commented 12 years ago

[author="jvs", created="Tue, 11 Sep 2007 19:46:55 -0500 (GMT-05:00)"] Adding John P as a watcher. EXPLAIN PLAN shows that integer division (/INT) is getting correctly chosen for the scaling operator, but for some reason the operands are double rather than integer, leading to the codegen captured by Rushan above. That seems to be where the problem lies.

0: jdbc:luciddb:> explain plan for
. . . . . . . . > select cast(1472727.2732511677 as double) from t;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| column0 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IterCalcRel(expr#0=[{inputs}], expr#1=[1472727.2732511677], expr#2=[Reinterpret($t1)], expr#3=[CAST($t2):DOUBLE NOT NULL], expr#4=[1E10], expr#5=[/INT($t3, $t4)], EXPR$0=[$t5]) |
| FennelToIteratorConverter |
| LcsRowScanRel(table=[[LOCALDB, S, T]], projection=[[LCS_RID]], clustered indexes=[[SYS$CLUSTERED_INDEX$T$C]]) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows selected (0.123 seconds)

dynamobi-build commented 12 years ago

[author="jvs", created="Tue, 11 Sep 2007 19:50:39 -0500 (GMT-05:00)"] Just noticed that Reinterpret operator is not showing target type in EXPLAIN PLAN above; FRG-293 logged for that.

dynamobi-build commented 12 years ago

[author="rchen", created="Tue, 11 Sep 2007 23:21:51 -0500 (GMT-05:00)"] Reinterpret($t1) in the plan above produces the unscaled "long" value stored in the Decimal(17,01) literal. In this example Reinterpret($t1) == 14727272732511677. The cast-to-double before division is probably because of the scaling factor 1.0E10 is interpreted as double value.

On the note of using the BigDecimal java type as internal represetation to preserve precision, I tried to change the CalcRel program into something like this:

IterCalcRel(expr#0=[{inputs}], expr#1=[1472727.2732511677], expr#2=[CAST($t1):DOUBLE NOT NULL], expr#4=[1E10], EXPR$0=[$t2])

and generate java code differently for "expr#2" using BigDecimal.doubleValue()(via a new EncodedSqlDecimal interface). Unfortunately, this only works for Java Calc. FennelCalc does not know how to evaluate [CAST(Decimal(17,10):DOUBLE NOT NULL]. For FennelCalc, expanding decimals into expressions of (unscaledPreciseInteger/scalingFacor) is still required.

dynamobi-build commented 12 years ago

[author="jpham", created="Wed, 12 Sep 2007 00:59:28 -0500 (GMT-05:00)"] Hi, leading and trailing zeroes and the case where precision=scale seem tricky. It would not surprise me to see problems there.

Numbers are treated as native floating point if they end in "E0", without a decimal intermediate.

In the past, interconversion of decimals and doubles was reliable for 15 or 16 digits. It would be nice to have insertion and selection be more consistent. It might be tricky though, since Farrago performs feats such as going back and forth between C++ and Java, and reducing constant expressions.

Double division is used for scaling in order to preserve fractions:
    14727272732511677L / 10000000000L = 1472727
    14727272732511677D / 1.0E10D = 1472727.2732511675

dynamobi-build commented 12 years ago

[author="rchen", created="Wed, 12 Sep 2007 13:11:45 -0500 (GMT-05:00)"] Trying to compare behavior with MySQL and noticed that they do not support cast to approximate types:

http://dev.mysql.com/doc/refman/5.1/en/cast-functions.html

As I commented earlier, Java Calc can be made to perform more precise cast by using the Java BigDecimal class instead of doing double divisions. This requires a few changes:

Decimal expansion rule change:

dynamobi-build commented 12 years ago

[author="jvs", created="Thu, 13 Sep 2007 03:41:39 -0500 (GMT-05:00)"] Thanks Rushan; deferring is fine, and we can use all of this info in the future to help us when we decide to tackle some or all of the issues.