[SEDONA-671] Implement Spider spatial data generator - Githubissues

apache / sedona

A cluster computing framework for processing large-scale geospatial data

https://sedona.apache.org/

Apache License 2.0

1.96k stars 692 forks source link

[SEDONA-671] Implement Spider spatial data generator #1680

Closed Kontinuation closed 1 week ago

Kontinuation commented 1 week ago

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a JIRA ticket?

Yes, the URL of the associated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-671. The PR name follows the format [SEDONA-XXX] my subject.

What changes were proposed in this PR?

This PR adds a new data source named "spider" for generating random spatial data.

Puloma Katiyar, Tin Vu, Sara Migliorini, Alberto Belussi, Ahmed Eldawy. "SpiderWeb: A Spatial Data Generator on the Web", ACM SIGSPATIAL 2020, Seattle, WA

Users can generate random spatial data using:

df = spark.read.format("spider") \
  .option("N", "10000") \
  .option("distribution", "gaussian") \
  .option("geometryType", "box") \
  .load()

This data source is inspired by https://bitbucket.org/bdlabucr/beast/src/master/doc/spatial-data-generator.md

How was this patch tested?

Add new Scala and Python tests

Did this PR include necessary documentation updates?

Yes, I have updated the documentation.