databrickslabs / mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
https://databrickslabs.github.io/mosaic/
Other
278 stars 66 forks source link

raster_to_grid api is returing "zero" values in measure column #428

Open motomohg opened 1 year ago

motomohg commented 1 year ago

I'm using mos.read().format("raster_to_grid") to read the raster file and extract H3 cell indices and measure values. I'm getting H3 cell indices but measure values are coming as zero for all indices.

Code snippet which returns zero (measure value):

resolution = 8
raster_grid_df = (mos.read().format("raster_to_grid")
    .option("fileExtension", "*.tif")
    .option("resolution", f"{resolution}")
    .option("kRingInterpolate", f"{resolution}")
    .load(source_file_path)

Since I was not getting the measure, I was trying to retile it. Please refer the below code snippet which runs for almost 2 hours but never returns the result (I killed the process after 2 hours). Raster file which I use is just 167MB

resolution = 8
raster_grid_df = (mos.read().format("raster_to_grid")
    .option("fileExtension", "*.tif")
    .option("resolution", f"{resolution}")
    .option("retile", "true")
    .option("tileSize", "512")
    .option("kRingInterpolate", f"{resolution}")
    .load(source_file_path)

I'm not sure If something is wrong with my implementation.

mjohns-databricks commented 1 year ago

@mtomohg we have a bundle of improvements coming with Mosaic 0.3.12, to be released very soon. Look forward to connecting with you on this soon!

motomohg commented 1 year ago

Thanks @mjohns-databricks for looking into it.