apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.96k stars 692 forks source link

[SEDONA-664] Add native GeoPackage reader #1603

Closed Imbruced closed 1 month ago

Imbruced commented 1 month ago

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Geopackage datasource

How was this patch tested?

integration tests

Did this PR include necessary documentation updates?

Imbruced commented 1 month ago

As a follow up to this mr, I need to add

Kontinuation commented 1 month ago

Thank you for this great work! My major concern is whether it works with GeoPackage files stored on cloud storage such as HDFS or S3. As far as I know org.xerial:sqlite-jdbc only works with local files, so we may have to download the GeoPackage file from cloud storage to the local file system before reading it.

Imbruced commented 1 month ago

@Kontinuation good point, I ll write the test to make sure it works. Integration test with minio would be enough I think.

Imbruced commented 1 month ago

WIP

Imbruced commented 1 month ago

@Kontinuation thanks for the review !

atiannicelli commented 1 month ago

@jiayuasu I can't wait to be able to use this for Overture Buildings Conflation. We are looking at using a couple datasets that are packaged as gpkg files.

Imbruced commented 1 month ago

@jiayuasu remove show method calls

Imbruced commented 1 month ago

@jiayuasu I applied changes, and what I saw is that we are running previous pipelines even if new are starting I think it's worth adding concurrency mechanism in github actions https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#concurrency