Open ricg72 opened 7 months ago
suspect the asset https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_viz/expressions/Pixelize.scala line 119 assert(pixels.size() > 0)
@ricg72 ST_Pixelize is not supposed to return 0 pixel. For any geometry (polygons, points, ...), it should return at 1 pixel. There might be something wrong with the logic itself. Do you want to take a stab?
Hi, what would the spec be ? -- if the object falls within a single pixel then displaying that single pixel is ok - it's the longer thin items that I am not clear how to display - maybe the algorithm could convert the polygon to a line (skeletonize ?) and then draw those pixels ?
I was actually trying to use ST_PIxelize to get all the pixels coordinates in a polygon to pass to RS_Values to get all the pixel values in a polygon. Is there a better way to do this ?
It's important to know which pixel came from where and to control precisely which pixels are inside and out of the polygon (a shift of 0.5 of a coordinate caused problems!)
Depending on what you want to do with the resulting pixel values, a few options:
@ricg72
Hi,
thanks for the suggestions - I'll try and test RS_Clip - (the RS_AsRaster / RS_ZonalStats - won't work) I suspect RS_Clip is going to cause performance issues because there are many geometries per image - the issue is how to prevent the image being read multiple times or being shuffled. Only way to tell is to try!
Another approach might be to update RS_Values to take an array of polygons instead of just an array of points - it would need to return an array[array[pixel values]] so that we could tell which pixel values belong to which geometry.
Expected behavior
ST_Pixelize returns 0 pixels
Actual behavior
ST_Pixelize throw assertion:
Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) at org.apache.spark.sql.sedona_viz.expressions.ST_Pixelize.eval(Pixelize.scala:119) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)
Steps to reproduce the problem
case class A(g : String) val a0 = "3.1" // change a0,a1 to values so an integer point lies between them val a1 = "3.8" val d = Seq( A(s"POLYGON (($a0 $a0, $a1 $a0, $a1 $a1, $a0 $a1, $a0 $a0))")) import spark.implicits._ val df = d.toDS().toDF() .withColumn("geo", expr("ST_GeomFromWKT(g)")) .withColumn("area", expr("ST_Area(geo)")) df.select("geo", "area").show(false)
val df2 = df .withColumn("px", expr("ST_Pixelize(geo, 10,10, ST_PolygonFromEnvelope(0,0,10,10))")) .show(false)
Settings
Sedona version = 1.5.0 Apache Spark version = 3.3.0 Apache Flink version = ? API type = Scala Scala version = 2.12 JRE version = 1.8 Python version = not tested Environment = Databricks?