Open geoHeil opened 7 years ago
Theoretically, you can do anything supported by Spark SQL DataFrame to a Simba DataFrame. As Simba DataFrame inherits from that of Spark SQL.
To represent additional fields, you simply add them to your structure. For example, you can define:
case class PointData(x: Point, payload: Int, tag: String)
And Simba will be able to automatically detect its fields and build the data frame. It will give you a schema like:
-- DataFrame |----- x : ShapeType |----- payload : Integer |----- tag : String
I see. And What about polygons? You seem to use Polygon.apply(Array(Point(Array(-1.0, -1.0)), Point(Array(1.0, -1.0)),
If I have WKT polygon strings how could these be converted?
So assuming a Data frame with Polygons like below
case class MyClass(a:String, b:int, wktString:String)
val df = Seq(MyClass("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"), MyClass("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).toDS()
val dfGeom = df.map(x => Polygon.fromWKB(x.wktString.toCharArray.map(_.toByte)))
is this how the conversion is supposed to be?
As for me this will fail with a code generator exception when calling dfGeom.show
17/03/20 20:26:50 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 102, Column 31: Assignment conversion not possible from type "org.apache.spark.sql.simba.spatial.Shape" to type "org.apache.spark.sql.simba.spatial.Polygon"
I think you can try this:
case class MyClass(a:String, b:int, wktString:Polygon)
val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, Polygon.fromWKB(x._3.toCharArray.map(_.toByte)))).toDS()
df.show()
I don't know if it can work, but you can try.
This would fail with com.vividsolutions.jts.io.ParseException: Unknown WKB type 71 already when trying to parse the WKT.
Well, I think this is a parsing problem of JTS, which is out of my scope now. And just to remind, general geometric objects including polygons are still under development.
Would about:
def toPolygon(s:String, u:String):SPolygon = {
@transient lazy val reader = new WKTReader()
reader.read(s) match {
case poly: Polygon => {
poly.setUserData(u)
SPolygon.fromJTSPolygon(poly)
}
}
}
val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, toPolygon(x._3, "foobar"))).toDS
df.show
not sure if it will join later on, but df.show works.
df.show()
should work. There must be something wrong with my fromWKT
function.
Nevertheless, I don't think it will work for joins since our current join algorithm does not support polygons, which is technically caused by no partitioner for polygons and it assumes the join keys will be evaluated as Point. This is coming from our legacy hacks for its original prototype (designed just for points). Still, I treat partitioning general geometry objects as a research problem.
Does simba have som UDF to support creation of a simbaDF out of a regular data frame? I.e. like magellans
df.withColumn("point", point('x, 'y))
If I am required to manually map all points / polygons to simba Geometry, how can I represent additional fiels?
val ps = (0 until 10000).map(x => PointData(Point(Array(x.toDouble, x.toDouble)), x + 1)).toDS
How can I parse WKT polygons to a simba supported geometry format?