cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.04k stars 623 forks source link

System does not prevent INSERTS with reserved NULL values for BIGINT columns #849

Open apavlo opened 7 years ago

apavlo commented 7 years ago

We reserve certain values for the different types to represent NULL values. These are defined in 'limits.h':

https://github.com/cmu-db/peloton/blob/master/src/include/type/limits.h#L42-L50

I added a new test case that creates a table with each type and then tries to insert a value with the reserved value. The front-end throws a proper error to tell you that you are out of range for most types. It fails on BIGINT. It also fails on DECIMAL but I think it's an IEEE-754 issue.


Steps to recreate:

postgres=# \pset null '[NULL]' 
Null display is "[NULL]".
postgres=# CREATE TABLE tblBIGINT(id INT PRIMARY KEY, b BIGINT);
CREATE TABLE
postgres=# INSERT INTO tblBIGINT VALUES (1, -9223372036854775808);
INSERT 0 1
postgres=# SELECT * FROM tblBIGINT;
 id |   b
----+--------
  1 | [NULL]
(1 row)
apavlo commented 7 years ago

You recreate this in a test case by uncommenting this line:

https://github.com/cmu-db/peloton/blob/master/test/sql/typelimit_sql_test.cpp#L29

fdjingyuan commented 7 years ago

I carefully traced this error, finding that it's caused by the incorrect type conversion between int64_t and double. This problem not only causes an out-of-range number to be stored as a NULL, but also causes a valid number not to be stored correctly, which is a more severe bug.

For the input:

INSERT INTO tblBIGINT VALUES (1, -9223372036854775808);

The number "-9223372036854775808" is first parsed as a std::string and then casted into a double using std::stod at:

https://github.com/cmu-db/peloton/blob/d9d120bb451c9bd4d7fb3993fef47c5c5cba06ab/src/parser/postgresparser.cpp#L454-L457

However, using the double type to store an int64 number will result in precision loss. As -9223372036854775808 will be stored as -9.2233720368547758E18 (-9223372036854775800) in the system.

Then comes the "out of range check", which compares this double num with the int64 range:

https://github.com/cmu-db/peloton/blob/d9d120bb451c9bd4d7fb3993fef47c5c5cba06ab/src/type/decimal_type.cpp#L289-L295

I wrote a simple code to show what happens and how "-9223372036854775808" passes the range check:

image

A more serious problem is that the BIGINT type will store a wrong number because of the precision loss. To show, try the following code:

CREATE TABLE test2(id BIGINT);
INSERT INTO test2 VALUES (9223372036854775000);
select * from test2;

9223372036854775000 is a valid int64 number but it is stored as 9223372036854774784:

image

To fix this problem, I think we may first convert all numbers to long double instead of double. However, the Value class doesn't offer a long double type yet, so it's not easy to implement it in a few lines. Maybe you can find better solution to fix this problem.

tomasic commented 7 years ago

I suspect the long double solution will have problems. The key is to never convert an integer to a float during the insert.

On Sat, Nov 18, 2017 at 7:29 AM Jingyuan Liu notifications@github.com wrote:

I carefully traced this error, finding that it's caused by the incorrect type conversion between int64_t and double. This problem not only causes an out-of-range number to be stored as a NULL, but also causes a valid number not to be stored correctly, which is a more severe bug.

For the input:

INSERT INTO tblBIGINT VALUES (1, -9223372036854775808);

The number "-9223372036854775808" is first parsed as a std::string and then casted into a double using std::stod at:

https://github.com/cmu-db/peloton/blob/d9d120bb451c9bd4d7fb3993fef47c5c5cba06ab/src/parser/postgresparser.cpp#L454-L457

However, using the double type to store an int64 number will result in precision loss. As -9223372036854775808 will be stored as -9.2233720368547758E18 (-9223372036854775800) in the system.

Then comes the "out of range check", which compares this double num with the int64 range:

https://github.com/cmu-db/peloton/blob/d9d120bb451c9bd4d7fb3993fef47c5c5cba06ab/src/type/decimal_type.cpp#L289-L295

I wrote a simple code to show what happens and how "-9223372036854775808" passes the range check:

[image: image] https://user-images.githubusercontent.com/26754001/32980654-88080898-cca5-11e7-8637-36ca37f4c81f.png

A more serious problem is that the BIGINT type will store a wrong number because of the precision loss. To show, try the following code:

CREATE TABLE test2(id BIGINT); INSERT INTO test2 VALUES (9223372036854775000); select * from test2;

9223372036854775000 is a valid int64 number but it is stored as 9223372036854774784:

[image: image] https://user-images.githubusercontent.com/26754001/32980664-99c49754-cca5-11e7-8980-170e8d0b89ae.png

To fix this problem, I think we may first convert all numbers to long double instead of double. However, the Value class https://github.com/cmu-db/peloton/blob/02670dab2f08bd002e3958a5a122887874274148/src/include/type/value.h#L320-L332 doesn't offer a long double type yet, so it's not easy to implement it in a few lines. Maybe you can find better solution to fix this problem.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/849#issuecomment-345442393, or mute the thread https://github.com/notifications/unsubscribe-auth/ABS8HCIyW4NKwZN6SNJhtcI1-i7Rz9qiks5s3tvQgaJpZM4QABis .

-- Anthony Tomasic Language Technologies Institute Carnegie Mellon University http://www.tiramisutransit.com http://mcds.cs.cmu.edu http://mcds.cs.cmu.edu http://www.cs.cmu.edu/~tomasic

seojungmin commented 7 years ago

This seems to me it happens from the same cause as #829, by eagerly transforming into an expression before exactly knowing what it should do.