Closed banesullivan closed 3 years ago
Based on the technical requirements, I may be misunderstanding the purpose of this feature
ImageFile
and ImageEntry
is optional, then the caller must expect immediate results. Should this endpoint be idempotent and return a previously generated sub-sample if exists? SubsampledImage
models that trigger jobs to be created. They return the model, which has a nullable URI reference to an ImageEntry which will be null. Hitting the same endpoint again (or maybe adding a get-by-id endpoint for subsampled imagery) will have to be tried until the job completes and the new URI populates. Is this correct?Answers:
(for future) I am wondering if @manthey will consider adding support for taking geojson as input for getRegion?
Currently large_image getRegion
can take coordinates in a variety of units (pixel space, physical distance in pixel space, any projection for geospatial entries). It can optionally scale the results. It currently only outputs a numpy array or an image and has the limitation that the entire region has to fit in memory (and, if outputting an image, the image has to be able to be created with PIL, which effectively limits it to 1 gigapixel). Some of the output can be transparent (of, more precisely, the output is likely to be in RGBA or LA color space).
To properly support RGD, we need to output a COG (not a tif through PIL) which includes geolocation data. For generality, for non-geospatial data large_image's getRegion should also be able to output pyramidal tiffs. Conceptually this isn't hard.
getRegion currently outputs a single image, I think with the limitation that the output is never more than 4 channels (RGBA), but I'll have to confirm that is true with numpy output as well. For hyperspectral files, this means that we might be picking no more than three channels (though you can composite the data as part of the process via the style
options). For palettized files (e.g., land use data where the pixel values are categories), getRegion outputs still outputs an RGBA image -- this means that the categorical information can be lost. If we need to preserve all of this, it means that when we output a region from large_image, the gdal tilesource might override the getRegion method when COG is requested and internally use GDAL to do the work. I think we'd only ever want to do this if no styling was applied (i.e., the bands aren't being remapped as part of the getRegion
request).
If we add geoJSON region selection, I'd want to support that for all tile sources.
@banesullivan The PR on large_image is ready to be used and has basic tests. You can pull from that PRs branch to try things out.
Great! Thanks for the update, I will start testing this
Switch the subsampling endpoints/backend to leverage
large_image
rather than the code inrgd/geodata/models/imagery/subsample.py
large_image
has agetRegion
function. It can take the corner points (in pixel space or in a projection's coordinates) and the desired output resolution or magnification. It outputs a rectangular image or NumPy array.Since
getRegion
doesn't handle GeoJSON in any way -- just min/max values in x and y in pixel or projection coordinates, we will simply use the min/max bounds of the GeoJSON geometry for these endpoints.Work to be done in RGD
api/geoprocess/imagery/<int:pk>/region/world/<xmin>/<xmax>/<ymin>/<ymax>
api/geoprocess/imagery/<int:pk>/region/pixel/<umin>/<umax>/<vmin>/<vmax>
large_image
'sgetRegion
to handle generating a sumbsampled image from the given parametersImageFile
andImageEntry
. Note that this issue is why I addedSubsampledImage
in the first placeapi/geoprocess/imagery/<int:pk>/region/annotation/<int:annotation_id>
Work to be done directly in
large_image
getRegion()
doesn't have any geospatial information attached to it -- we need to set it to output the region as a Cloud Optimized GeoTiff (or regular GeoTiff)