ResonantGeoData / RGD-ScrumBoard

Broad tasking across RGD projects
0 stars 0 forks source link

Use large_image to subsample imagery #6

Closed banesullivan closed 3 years ago

banesullivan commented 3 years ago

Switch the subsampling endpoints/backend to leverage large_image rather than the code in rgd/geodata/models/imagery/subsample.py


large_image has a getRegion function. It can take the corner points (in pixel space or in a projection's coordinates) and the desired output resolution or magnification. It outputs a rectangular image or NumPy array.

Since getRegion doesn't handle GeoJSON in any way -- just min/max values in x and y in pixel or projection coordinates, we will simply use the min/max bounds of the GeoJSON geometry for these endpoints.

Work to be done in RGD

Work to be done directly in large_image

subdavis commented 3 years ago

Based on the technical requirements, I may be misunderstanding the purpose of this feature

subdavis commented 3 years ago

Answers:

aashish24 commented 3 years ago

(for future) I am wondering if @manthey will consider adding support for taking geojson as input for getRegion?

manthey commented 3 years ago

Currently large_image getRegion can take coordinates in a variety of units (pixel space, physical distance in pixel space, any projection for geospatial entries). It can optionally scale the results. It currently only outputs a numpy array or an image and has the limitation that the entire region has to fit in memory (and, if outputting an image, the image has to be able to be created with PIL, which effectively limits it to 1 gigapixel). Some of the output can be transparent (of, more precisely, the output is likely to be in RGBA or LA color space).

To properly support RGD, we need to output a COG (not a tif through PIL) which includes geolocation data. For generality, for non-geospatial data large_image's getRegion should also be able to output pyramidal tiffs. Conceptually this isn't hard.

getRegion currently outputs a single image, I think with the limitation that the output is never more than 4 channels (RGBA), but I'll have to confirm that is true with numpy output as well. For hyperspectral files, this means that we might be picking no more than three channels (though you can composite the data as part of the process via the style options). For palettized files (e.g., land use data where the pixel values are categories), getRegion outputs still outputs an RGBA image -- this means that the categorical information can be lost. If we need to preserve all of this, it means that when we output a region from large_image, the gdal tilesource might override the getRegion method when COG is requested and internally use GDAL to do the work. I think we'd only ever want to do this if no styling was applied (i.e., the bands aren't being remapped as part of the getRegion request).

If we add geoJSON region selection, I'd want to support that for all tile sources.

manthey commented 3 years ago

See https://github.com/girder/large_image/issues/567 and https://github.com/girder/large_image/issues/566

manthey commented 3 years ago

See https://github.com/girder/large_image/pull/594.

manthey commented 3 years ago

@banesullivan The PR on large_image is ready to be used and has basic tests. You can pull from that PRs branch to try things out.

banesullivan commented 3 years ago

Great! Thanks for the update, I will start testing this

banesullivan commented 3 years ago

WIP in https://github.com/ResonantGeoData/ResonantGeoData/pull/346