apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.96k stars 692 forks source link

[SEDONA-673] Fix issue when loading geoparquet file without bbox metadata. #1681

Closed Imbruced closed 6 days ago

Imbruced commented 1 week ago

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

This PR is related to JIRA ticket SEDONA-673.

What changes were proposed in this PR?

When reading geoparquet which does not have bbox in metadata query like below fails with Index OutofBoundException

sparkSession.read
        .format("geoparquet")
        .load(overtureBBOX)
        .where("ST_Intersects(geometry, ST_PolygonFromEnvelope(0, 0, 1, 1))")
        .count()

The issue was when evaluating predicate pushdowns

How to reproduce Download data with the command

uv run overturemaps download --bbox 0,0,1,1 -f geoparquet --type=place -o bbox.geoparquet

then try to load it with some predicate like contains or within.

How was this patch tested?

Unit test testing the given issue

Did this PR include necessary documentation updates?