SlideRuleEarth / sliderule

Server and client framework for on-demand science data processing in the cloud
https://slideruleearth.io
Other
26 stars 11 forks source link

Raterized subsetting takes substantially longer than polygon based subsetting #333

Closed jpswinski closed 8 months ago

jpswinski commented 10 months ago

When making request to subset ATL03 data, if the request includes the "raster" keywork which indicates that the data should be subsetted using a rasterized version of the polygon, then the subsetting request takes about 6x longer.

The following request takes ~18 seconds

    parms = { 
        "poly": region['poly'],
        "raster": region['raster'],
        "srt": icesat2.SRT_LAND,
        "cnf": icesat2.CNF_SURFACE_LOW,
        "ats": 20.0,
        "cnt": 10,
        "len": 40.0,
        "res": 20.0 }
    icesat2.atl03sp(parms, resources=[args.granule03])

The following request takes ~3 seconds

    parms = { 
        "poly": region['poly'],
        "srt": icesat2.SRT_LAND,
        "cnf": icesat2.CNF_SURFACE_LOW,
        "ats": 20.0,
        "cnt": 10,
        "len": 40.0,
        "res": 20.0 }
    icesat2.atl03sp(parms, resources=[args.granule03])
jpswinski commented 10 months ago

Drilling down, it looks like burning the geojson is very fast, but that the inclusion check is very slow. Looking at the code, it is going through the same getSamples() call that all other GeoRaster and GeoIndexedRaster rasters do. I wonder if the include function could just convert the coordinates to pixel coordinates itself and do a lookup directly in memory.

jpswinski commented 8 months ago

Fixed in v4.0.4 - the GeoJsonRaster was reworked to create a local buffer (subset) of pixels with direct access/lookup for the "includes" call.