Permian-Global-Research / rsi

Code for Retrieving STAC Information, addressing Repeated Spatial Infelicities and interfacing with Rsome Spectral Indices
https://permian-global-research.github.io/rsi/
Apache License 2.0
43 stars 5 forks source link

Changes for mdsumner review #78

Closed mikemahoney218 closed 3 weeks ago

mikemahoney218 commented 1 month ago

Please make mention somewhere that "where the compute runs" (i.e. what "local" means) is important here. There's an issue with performance in different regions - and some pointers to how one might run in a region closer to where the usual data sources are (us-west2 for example. I'm not suggesting that's an easy topic to cover, just would like to see it mentioned. It takes about 2x as long to run examples here in Tasmania, vs computers in us-west (I know that comparison is a lot more complex than this). It might be an idea to point to ways of running computers on public systems that run closer to the data, or at least a guide to that.

I added a small mention of this to the README and a larger mention to the Downloading vignette: https://github.com/Permian-Global-Research/rsi/pull/78/commits/55ad0d957bd54a4af9a4300803b4f412d6950684

I'd like see some validations of the raster values obtained in some examples, if there was an external example that's documented elsewhere (in a python notebook, or another R package) it would be neat to have a clear comparison of the scale/s shown be these raster files obtained here against an independent example. (I had intended to do this myself ...)

I was looking for an example where I can make a true colour image with RGB, I don't understand the scaling that occurs in the examples (we have 0-1 scale numbers, which don't work with terra::plotRGB or its stretch argument off the shelf, and leave me unclear about the scaled ranges and whether I am plotting an RGB image correctly. The first example in the readme plots an RGB image as separate bands, and I'm unclear what to do to plot that as an "image". Again, I had meant to provide an actual example here. I believe the advice should be:

plotRGB(stretch(x))

## note that  this provides different defaults to stretch() itself
plotRGB(x, stretch = "lin")

but also, it's open to interpretation and expert use afaics.

Added to the "How can I" vignette: https://github.com/Permian-Global-Research/rsi/pull/78/commits/528e31f7d62045471ac9233ae8934bfa4da7d445

And to the README and the get_stac_data() examples: https://github.com/Permian-Global-Research/rsi/pull/78/commits/7c2c0cfd8bbddc5f1a519aa18eb829a2afdfdd4f

Specific notes

Two of these three files have no data in the aoi region.

qfiles <- get_landsat_imagery(aoi, 
 start_date = "2022-06-01", end_date = "2022-06-30", 
composite_function = NULL)

I tried this naive thing, to run rsi_query_api, and I think at the least it should error fast when not getting a bbox in longlat.

it could detect that the bbox input is not in longlat and 1) error or 2) just transform it.

Fixed in https://github.com/Permian-Global-Research/rsi/pull/78/commits/d0acb46a87317cf643b630c926ca444caabeb3e8 . The documentation was also just wrong here; this function needed a bbox, not an sfc object. I changed things so either works. You can tell this function was pulled out from get_stac_data() relatively late in the game :sweat_smile:

It's often useful to buffer your aoi object slightly, on the order of 1-2 cell widths, in order to ensure that data is downloaded for your entire AOI even after accounting for any reprojection needed to compare your AOI to the data on the STAC server.

Here I would rather reproject the extent, this is not so hard to do and exists in a few places raster::projectExtent, reproj::reproj_extent, and terra::project but possibly the best option is to use GDAL warp (via sf as elsewhere here) to reproject an empty raster. (Happy to follow up to illustrate).

This function can either download all data that intersects with your spatiotemporal AOI as multiple files (if composite_function = NULL), or can be used to rescale band values, apply a mask function, and create a composite from the resulting files in a single function call

On this point, GDAL warp can itself do these things, itself a monolith of abstractions itself but we can possibly avoid downloading entire scenes. I mention this not as a strong suggestion, mainly an invite to discuss further (I'm looking at the rise of {gdalraster} here).

Unfortunately I don't think gdalwarp can handle complicated compositing yet: https://github.com/OSGeo/gdal/issues/5176 And I haven't found any way to make it handle masking, either.

With regards to simple composites ("latest pixel wins" style), this is one of the messiest parts of get_stac_data(), but we actually do use gdalwarp directly to handle those. Specifically, these lines check if we can get away with using the warper to stamp a bunch of images together:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L366-L369

The output of that gets passed as merge to the download function:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L374-L383

Then this is where things get really silly: if merge == TRUE, we only create a single output file for each asset: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L64-L76

And then we wind up calling this warp with multiple source URLs and only the single output path, meaning we warp all the files while downloading: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123

This is actually a lot less complicated than it used to be -- there used to be an rsi_simple_download and an rsi_complex_download to handle the warpable/not-warpable downloads separately, which didn't share any code paths and as a result got quickly out of sync. Handling both via the same path makes things less readable, but means that all downloads flow down the same (heavily tested) pathway.

All that said, I documented this a bit more here: https://github.com/Permian-Global-Research/rsi/pull/78/commits/0b294d3f879e571e07508664a9c0426e7272d705

With regards to rescaling via the warper -- I'm definitely interested in this, but I've seen some rather complex rescaling formulas in the wild that aren't just a simple scale and offset, which has made me a bit spooked. I think there's still some dark magic you can do by writing a VRT with a complex transform equation, but I don't know that I understand VRTs well enough to maintain a package that did that, right now.

It can be a good idea to tile your aoi using sf::st_make_grid()

I think this really needs an example, because there's room for guidance on creating a nice tile mosaic definition (with terra::align for example, or actually with st_tile or getTileExtents). It's very important point, and say what if we wanted a (CONUS) state-level imagery. This can be done a little more abstractly using terra::makeTiles and stars::st_tile (which give the index and extents helpfully but separately). (Maybe an assign-to-me task).

I added an example of using st_make_grid() here: https://github.com/Permian-Global-Research/rsi/pull/78/commits/f95714f51894d1f848ec051e92b1d8227eb04cfb

If you set the rsi_pc_key environment variable to your key (either primary or secondary; there is no difference), rsi will automatically use this key to sign all requests against Planetary Computer.

I only note this as a discussion-bounce, for possible followup: GDAL can do this too, and has file system abstrations that can copy, info, or warp-in-part to a target grid.

This example (in the README) should save the file name as an output. It's otherwise a pipeline that I don't get an object for in the end, and it can take some time (in Australia).

Fixed: https://github.com/Permian-Global-Research/rsi/pull/78/commits/280d513325fcc499329abc37903d0d28e41eea21

I have some minor discomforts about when exactly warping or masking is done. I just want to talk about the details of this and might follow up, I'm a bit lost in the depths of the compositing function and use of sprc and mosaic, though.

Yeah, it's an easy function to get lost in. You can see my own notes here from the last time I was refactoring:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L240

The steps are usually querying, filtering down the returned results, (warping and) downloading the relevant items, masking them, compositing the outputs, and then rescaling.

The warping is controlled by the gdalwarp_options object, primarily. The first "live" code (not just parameter checking) is this bit here, which processes those options: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L295-L300

That processing function just sets the t_srs and tr options, if they weren't provided, to handle the actual warp: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L781-L798

Which then eventually gets passed to the actual download call: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123

So that's warping and downloading handled: each asset is (usually) warped and downloaded separately.

Each asset then gets masked independently: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L388-L394

These are masked independently mostly to make the implementation easier, because now each asset is composited into a single file per asset: https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L399-L409

I didn't want to try and track if all assets existed in all items, or so on, and so we don't aggregate assets into items until after rescaling.

As for compositing: there's three pathways here. The first one I discussed above: if files didn't need to get masked or rescaled, we composited them during the download stage and they skip the composite process entirely.

The second one is if users specified "merge" but also wanted a mask or rescaling, in which case we're basically just calling terra::merge(): https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L690-L697

The third is for all the other functions, which are applied using terra::mosaic(): https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L699-L707

As far as I'm concerned, sprc() is just a container that can handle one-or-more possibly overlapping rasters (compared to rast(), which can't). We're using that to hold the unknown number of possibly-overlapping files associated with a single asset, then merging them using some terra function.

Hope this all made sense!

A standalone question I had: can we use the aoi to drive the STAC query, but then download the files as-is without cropping/warping or compositing? I guess that would mean providing a link to the temp-file space being used.

I added a section to the "How Can I" vignette about one version of this -- downloading each item separately: https://github.com/Permian-Global-Research/rsi/pull/78/commits/528e31f7d62045471ac9233ae8934bfa4da7d445

(To skip masking, you'd also set mask_function = NULL)

If you want to get each asset separately, that's probably where rstac::assets_download() becomes more useful -- or building a query with rstac and using GDAL to grab items yourself. I think at that point, you no longer want the additional abstraction rsi gives you, and it makes sense to go one level "lower" down the stack.

github-actions[bot] commented 1 week ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.