apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.95k stars 693 forks source link

ST_Pixelize small polygon error #1272

Open ricg72 opened 7 months ago

ricg72 commented 7 months ago

Expected behavior

ST_Pixelize returns 0 pixels

Actual behavior

ST_Pixelize throw assertion:

Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) at org.apache.spark.sql.sedona_viz.expressions.ST_Pixelize.eval(Pixelize.scala:119) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)

Steps to reproduce the problem

case class A(g : String) val a0 = "3.1" // change a0,a1 to values so an integer point lies between them val a1 = "3.8" val d = Seq( A(s"POLYGON (($a0 $a0, $a1 $a0, $a1 $a1, $a0 $a1, $a0 $a0))")) import spark.implicits._ val df = d.toDS().toDF() .withColumn("geo", expr("ST_GeomFromWKT(g)")) .withColumn("area", expr("ST_Area(geo)")) df.select("geo", "area").show(false)

val df2 = df .withColumn("px", expr("ST_Pixelize(geo, 10,10, ST_PolygonFromEnvelope(0,0,10,10))")) .show(false)

Settings

Sedona version = 1.5.0 Apache Spark version = 3.3.0 Apache Flink version = ? API type = Scala Scala version = 2.12 JRE version = 1.8 Python version = not tested Environment = Databricks?

ricg72 commented 7 months ago

suspect the asset https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_viz/expressions/Pixelize.scala line 119 assert(pixels.size() > 0)

jiayuasu commented 7 months ago

@ricg72 ST_Pixelize is not supposed to return 0 pixel. For any geometry (polygons, points, ...), it should return at 1 pixel. There might be something wrong with the logic itself. Do you want to take a stab?

ricg72 commented 7 months ago

Hi, what would the spec be ? -- if the object falls within a single pixel then displaying that single pixel is ok - it's the longer thin items that I am not clear how to display - maybe the algorithm could convert the polygon to a line (skeletonize ?) and then draw those pixels ?

I was actually trying to use ST_PIxelize to get all the pixels coordinates in a polygon to pass to RS_Values to get all the pixel values in a polygon. Is there a better way to do this ?

It's important to know which pixel came from where and to control precisely which pixels are inside and out of the polygon (a shift of 0.5 of a coordinate caused problems!)

jiayuasu commented 7 months ago

Depending on what you want to do with the resulting pixel values, a few options:

  1. RS_Clip: Clip/Crop the image by the given geometry
  2. RS_AsRaster: Rasterize a geometry to a raster using a reference raster: . Given two rasters, you can RS_MapAlgebra to perform arbitrary operations on values of two rasters
  3. RS_ZonalStats: calculate the agg values of pixels inside a given geometry
jiayuasu commented 7 months ago

@ricg72

ricg72 commented 7 months ago

Hi,

thanks for the suggestions - I'll try and test RS_Clip - (the RS_AsRaster / RS_ZonalStats - won't work) I suspect RS_Clip is going to cause performance issues because there are many geometries per image - the issue is how to prevent the image being read multiple times or being shuffled. Only way to tell is to try!

Another approach might be to update RS_Values to take an array of polygons instead of just an array of points - it would need to return an array[array[pixel values]] so that we could tell which pixel values belong to which geometry.