apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

H3 index bug #7766

Open TheWinds opened 2 years ago

TheWinds commented 2 years ago

When using the H3 index, in some cases, the correct filtering of distances is not possible.

Test case

Data

geodata.csv

name,lng,lat
a,-103.34056886616187,20.63611463218247
b,-103.31104310932645,20.606231326034603
c,-103.33524736347661,20.595626162929634

schema

{
    "metricFieldSpecs": [],
    "dimensionFieldSpecs": [
        {
            "dataType": "STRING",
            "name": "name"
        },
        {
            "dataType": "FLOAT",
            "name": "lng"
        },
        {
            "dataType": "FLOAT",
            "name": "lat"
        },
        {
            "dataType": "BYTES",
            "name": "location_st_point",
            "transformFunction": "toSphericalGeography(stPoint(lng,lat))"
        }
    ],
    "schemaName": "geodata"
}

table

{
    "tableName": "geodata",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "1",
      "segmentPushType": "APPEND",
      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
      "schemaName": "geodata",
      "replication": "1"
    },
    "tenants": {
    },
    "fieldConfigList": [
      {
        "name": "location_st_point",
        "encodingType": "RAW",
        "indexType": "H3",
        "properties": {
          "resolutions": "7"
        }
      }
    ],
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "noDictionaryColumns": [
        "location_st_point"
      ]
    },
    "metadata": {
      "customConfigs": {
      }
    }
  }

Query

select name,
    lat,
    lng,
    ST_DISTANCE(
        location_st_point,
        ST_Point(-103.34417375507813, 20.64061268636347, 1)
    ) as distance
from geodata
where ST_DISTANCE(
        location_st_point,
        ST_Point(-103.34417375507813, 20.64061268636347, 1)
    ) < 5000;

Query Response

name,lat,lng,distance
a,20.636114,-103.34057,625.2019871997669
b,20.606232,-103.31104,5148.1557916677875
c,20.595627,-103.33525,5087.813601469894

Analysis

**This appears to be a problem with KRing calculations***

Code

https://github.com/apache/pinot/blob/e3d238ac1d8633331d9507713266e41e6b40f870/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/H3IndexFilterOperator.java#L186-L198

Visualize

poc

Jackie-Jiang commented 2 years ago

Thanks for reporting this. What is the edgeLength for your H3 index? Based on the visualization, I think (int) Math.floor((distance / _edgeLength - 2) / 1.7321) should be 0 (distance should be less than (2 + 1.7321) * _edgeLength).

TheWinds commented 2 years ago

Thanks for reporting this. What is the edgeLength for your H3 index? Based on the visualization, I think (int) Math.floor((distance / _edgeLength - 2) / 1.7321) should be 0 (distance should be less than (2 + 1.7321) * _edgeLength).

The _edgeLength of resolution 7 is 1220.629759

Math.floor(( 5000 / 1220.629759 - 2 ) / 1.7321) == 1

You can click this link to see the visualize result https://codepen.io/jsthewinds/pen/GRvwXoe

and the center point is dragable , drag it or modify config params to obverse it.

Jackie-Jiang commented 2 years ago

I somehow feel the _edgeLength is not accurate. Based on the visualization, the diameter of the circle should be less than 7 * _edgeLength, but 10000 > 8 * 1220.629759

Jackie-Jiang commented 2 years ago

Adding @yupeng9