apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Handle `IS NULL/IS NOT NULL` filtering when ExpressionTransformer outputs `null` #14433

Open itschrispeck opened 2 days ago

itschrispeck commented 2 days ago

Take the transformation:

        {
          "columnName": "col1",
          "transformFunction": "CASE WHEN col2 IS NOT NULL THEN col2 ELSE null END"
        },

Although the function returns null, typical null value handling does not apply. Ideally, if an ExpressionTransformer returns null, the value can still be filtered via IS NULL/IS NOT NULL. To maintain backwards compatibility we could add a new config per transformFunction, enableNullHandling:

        {
          "columnName": "col1",
          "transformFunction": "CASE WHEN col2 IS NOT NULL THEN col2 ELSE null END",
          "enableNullHandling": true
        },
Jackie-Jiang commented 10 hours ago

cc @swaminathanmanish

What is the current behavior? What is the value produced when the expression returns null? We should just use the setting in table config instead of introducing a new flag.

itschrispeck commented 8 hours ago

Current behavior is that the value is an empty string e.g. select col1 from tbl where col1 is not null gives:

  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "col1"
      ],
      "columnDataTypes": [
        "STRING"
      ]
    },
    "rows": [
      [
        ""
      ],
      ...
  }

Conversely, is null does not match any docs.

I'd also prefer avoiding a new config, I wasn't sure if this should be considered a bug since changing the behavior would be backwards incompatible. Though looking at this briefly, it seems NullValueTransformer is intended to handle null output from ExpressionTransformer

Jackie-Jiang commented 6 hours ago

Correct. I feel the real issue is that null is not properly set in the ExpressionTransformer