apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.38k stars 1.26k forks source link

FLOAT missing from Literal datatypes, but defined in Column datatypes. #6814

Open amrishlal opened 3 years ago

amrishlal commented 3 years ago

FLOAT is missing from Literal datatypes (org.apache.pinot.common.request.Literal) but defined as a column datatype (org.apache.pinot.spi.data.FieldSpec.DataType). This may create type conversion issues since the only way to specify values for FLOAT column is by using DOUBLE literal values.

Literal defines the following datatypes: BOOL, BYTE, SHORT, INT, LONG, DOUBLE, STRING, BINARY

FieldSpec.DataType defines the following datatypes: INT, LONG, FLOAT, DOUBLE, BOOLEAN/ Stored as STRING /, STRING, BYTES, STRUCT, MAP, LIST;

Jackie-Jiang commented 3 years ago

The literal type should not matter. We should convert the literal values based on the field data type, and as long as the values can be represented by the literal, we are good. Actually on the server side, literal is always stored as string

amrishlal commented 3 years ago

From what I saw server is converting a string literal that appears in the predicate at least twice to an integer value and this is after Broker/Calcite has already parsed the string as number and possibly operated on (plus/minus scalar functions). So while it works, there is a bit of cost associated with it. There are also potential correctness issues. For example long to double conversion (which is happening in ScalarFunction.java on the Broker side) is a lossy conversion. None of these issues by themselves are major, but combined together they seem to indicate that some cleanup with type handling would be useful.

Jackie-Jiang commented 3 years ago

I agree that always using string literal is not efficient (legacy behavior carried over from PQL). For literal, we should keep only the necessary types as long as the value can be represented correctly without losing accuracy. Keeping LONG, DOUBLE and STRING should be able to represent all types supported by Pinot. We don't want to add too many literal types because that will complicate the logic of type conversion from literal to pinot type.

Jackie-Jiang commented 11 months ago

To achieve standard SQL behavior, we need to preserve the literal type. Submitted #11697 to represent float literal