Open CICDamen opened 1 year ago
Hi @casperdamen123 thank you for reporting the issue. Could you please confirm in which CRS are your geometries, so that we could reproduce the problem in tests and produce a fix.
Hi @milos-colic, you're welcome, thanks for investigating!
The geometries are in https://epsg.io/28992 and we set this using st_setsrid(st_geomfromwkb(geometrie), 28992))
@casperdamen123 thanks for confirming this. H3 is only concious of geometries in 4326 and silently fails (returns empty set) if geometries aren't in 4326. We plan to add support fo automatically handling this in the future releases. (ETA one or two versions) However, at the moment you'd need to transform geometries for tesselation to 4326. You can use https://databrickslabs.github.io/mosaic/api/spatial-functions.html#st-transform to do so. I will open a ticket on our internal dev JIRA for adding support for this.
@milos-colic, thanks for your response.
I'm not sure if I fully understand, does this fix then only relate to the SQL bindings?
Because when using the grid_tesselate
function in PySpark, I do get valid chips returned. Even when I'm using another CRS.
See below example:
(dataf
.withColumn("geom", mos.st_setsrid(mos.st_geomfromwkb(F.col("geometrie")), F.lit(28992)))
.withColumn("idx", mos.grid_tessellate(F.col("geom"), F.lit(2)))
).display()
Could this maybe be related to the order of registering the SQL functions as UDFs relative to the setting of the custom grid?
@casperdamen123 Apologies for the delay in coming back to you on this.
Could you confirm the coordinates for the geometries that return chips that are not empty are lager than 180/90. It may happen that near the origin the values are valid WRT H3 domain.
For other CRSs please use Custom Grid instead of H3, H3 is only intended for 4326, in other CRSs it will only produce chips for the part that would fit in the -180/180 and -90/90 bounding box.
Custom grid docs: https://databrickslabs.github.io/mosaic/api/spatial-indexing.html
@milos-colic Thanks for coming back on this issue!
Actually, when using the SQL bindings, all chips that are returned are empty.
As mentioned in one of my earlier comments, we are using the custom grid below:
spark.databricks.labs.mosaic.index.system CUSTOM(0, 310000, 280000, 640000, 2, 2000, 2000)
Could this maybe be related to the order of registering the SQL functions as UDFs relative to the setting of the custom grid?
To be clear, the same functionality does work with the above grid setting when using the Python bindings. Would be really nice if we could leverage the tesselation SQL bindings, so that we can also store the logic in views for example.
Describe the bug When using the
grid_tessellate
function in SQL on valid geometry, it returns empty chips.To Reproduce Steps to reproduce the behavior:
grid_tessellate
function on valid geometry data to create a new columnExpected behavior I would expect the chips to always contain data and not be empty
Screenshots
Additional context I've tried switching the following Spark config settings on/off (both in the notebook and in the cluster during startup)
spark.databricks.mosaic.geometry.api ESRI spark.databricks.labs.mosaic.index.system CUSTOM(0, 310000, 280000, 640000, 2, 2000, 2000)
I can imagine that the custom index system is important but am unsure where/when to set it make sure the correct grid is used.
Other SQL functions like
st_centroid
,st_setsrid
,st_geomfromwkb
,st_aswkb
work as expected.