girder / large_image

Python modules to work with large multiresolution images.
http://girder.github.io/large_image/
Apache License 2.0
190 stars 41 forks source link

Override outputPath for GDALFileTileSource.getRegion #682

Open banesullivan opened 2 years ago

banesullivan commented 2 years ago

Would it be okay to override outputPath for the GDAL tile source's getRegion?

https://github.com/girder/large_image/blob/d68aec03e303336231c1bce2f723d645c5a6a76a/sources/gdal/large_image_source_gdal/__init__.py#L1137

There are many circumstances where I want to store/use that file immediately and I don't want to have to copy the file out of a temp folder into my working space.

Also, I noticed once on my Mac that nearly immediately after the temp file was created by getRegion, it got deleted by the system as there was some sort of automatic cleanup.

manthey commented 2 years ago

We could just add another parameter to getRegion, that, if set, would use it as the output name.

banesullivan commented 2 years ago

In addition to this, would it be worth standardizing the output of getRegion? The fact that we have this bit of logic in RGD seems problematic to me: https://github.com/ResonantGeoData/ResonantGeoData/blob/d827fb78f7221f9888719f7c88e20cf7ae15bc9f/django-rgd-imagery/rgd_imagery/large_image_utilities.py#L30-L43

That is, different TileSource's getRegion methods return different things. And even the GDAL tile source has different return values depending on input parameters.

IMO, getRegion should return a Path to the extracted region in all cases but I imagine there is a lot of downstream code for non-geospatial use cases that assumes the image binary content is returned

manthey commented 2 years ago

getRegion originally just returned the region; this is particularly useful if you want a numpy array. The TILED output option creates a file and is the mechanism if the data is probably too large to fit in memory or you need tiled or georeferenced output.

This is the age-old library design question -- which parameters should be expressed as options to a method and which should be expressed as part of a method name? Internally, there is so much code overlap between output formats, so even if you had getRegionAsNumpyArray, getRegionAsTiledFile, getRegionAsJpegImage, getRegionAsTiffImage, getRegionAsPngImage, etc., internally it would be largely doing what it is doing now -- calling some common function with just a small amount of code special to the specific output. Additionally, if you want to expose all of those functions as endpoints, you then need more endpoints rather than fewer endpoints with more parameters.

If you always use TILED, you'll always get a tiff file. To unify RGD code paths, would you want to also have small images returned as files? Or something else?