apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.96k stars 692 forks source link

[SEDONA-671] Implement Spider spatial data generator #1680

Closed Kontinuation closed 1 week ago

Kontinuation commented 1 week ago

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This PR adds a new data source named "spider" for generating random spatial data.

Puloma Katiyar, Tin Vu, Sara Migliorini, Alberto Belussi, Ahmed Eldawy. "SpiderWeb: A Spatial Data Generator on the Web", ACM SIGSPATIAL 2020, Seattle, WA

Users can generate random spatial data using:

df = spark.read.format("spider") \
  .option("N", "10000") \
  .option("distribution", "gaussian") \
  .option("geometryType", "box") \
  .load()

This data source is inspired by https://bitbucket.org/bdlabucr/beast/src/master/doc/spatial-data-generator.md

How was this patch tested?

Add new Scala and Python tests

Did this PR include necessary documentation updates?