InitialDLab / Simba

Spatial In-Memory Big data Analytics
Apache License 2.0
121 stars 62 forks source link

convert regular DF to simba supported geometries #87

Open geoHeil opened 7 years ago

geoHeil commented 7 years ago

Does simba have som UDF to support creation of a simbaDF out of a regular data frame? I.e. like magellansdf.withColumn("point", point('x, 'y))

If I am required to manually map all points / polygons to simba Geometry, how can I represent additional fiels? val ps = (0 until 10000).map(x => PointData(Point(Array(x.toDouble, x.toDouble)), x + 1)).toDS

How can I parse WKT polygons to a simba supported geometry format?

dongx-psu commented 7 years ago

Theoretically, you can do anything supported by Spark SQL DataFrame to a Simba DataFrame. As Simba DataFrame inherits from that of Spark SQL.

To represent additional fields, you simply add them to your structure. For example, you can define:

case class PointData(x: Point, payload: Int, tag: String)

And Simba will be able to automatically detect its fields and build the data frame. It will give you a schema like:

-- DataFrame |----- x : ShapeType |----- payload : Integer |----- tag : String

geoHeil commented 7 years ago

I see. And What about polygons? You seem to use Polygon.apply(Array(Point(Array(-1.0, -1.0)), Point(Array(1.0, -1.0)), If I have WKT polygon strings how could these be converted?

dongx-psu commented 7 years ago

Refer to:

https://github.com/InitialDLab/Simba/blob/standalone-2.1/src/main/scala/org/apache/spark/sql/simba/spatial/Polygon.scala#L117

geoHeil commented 7 years ago

So assuming a Data frame with Polygons like below

case class MyClass(a:String, b:int, wktString:String)
val df = Seq(MyClass("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"), MyClass("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).toDS()
val dfGeom = df.map(x => Polygon.fromWKB(x.wktString.toCharArray.map(_.toByte)))

is this how the conversion is supposed to be? As for me this will fail with a code generator exception when calling dfGeom.show

17/03/20 20:26:50 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 102, Column 31: Assignment conversion not possible from type "org.apache.spark.sql.simba.spatial.Shape" to type "org.apache.spark.sql.simba.spatial.Polygon"

dongx-psu commented 7 years ago

I think you can try this:

case class MyClass(a:String, b:int, wktString:Polygon)
val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, Polygon.fromWKB(x._3.toCharArray.map(_.toByte)))).toDS()
df.show()

I don't know if it can work, but you can try.

geoHeil commented 7 years ago

This would fail with com.vividsolutions.jts.io.ParseException: Unknown WKB type 71 already when trying to parse the WKT.

dongx-psu commented 7 years ago

Well, I think this is a parsing problem of JTS, which is out of my scope now. And just to remind, general geometric objects including polygons are still under development.

geoHeil commented 7 years ago

Would about:

def toPolygon(s:String, u:String):SPolygon = {
    @transient lazy val reader = new WKTReader()
    reader.read(s) match {
      case poly: Polygon => {
        poly.setUserData(u)
        SPolygon.fromJTSPolygon(poly)
      }
    }
  }
  val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, toPolygon(x._3, "foobar"))).toDS
df.show

not sure if it will join later on, but df.show works.

dongx-psu commented 7 years ago

df.show() should work. There must be something wrong with my fromWKT function.

Nevertheless, I don't think it will work for joins since our current join algorithm does not support polygons, which is technically caused by no partitioner for polygons and it assumes the join keys will be evaluated as Point. This is coming from our legacy hacks for its original prototype (designed just for points). Still, I treat partitioning general geometry objects as a research problem.