e-sensing / sits

Satellite image time series in R
https://e-sensing.github.io/sitsbook/
GNU General Public License v2.0
456 stars 76 forks source link

Handling of `roi` in `sits_cube()` #1209

Open GreatEmerald opened 2 weeks ago

GreatEmerald commented 2 weeks ago

This is something in between a bug report and a feature request about how ROI is handled in sits_cube.

  1. The documentation says that the roi argument of sits_cube() accepts an sf object, shapefile (a filename?), a named vector or named lat/long values. It does not seem to support the usual SpatExtent objects from terra, but it would be handy, otherwise one has to always do as.vector(MyExtent) for them to work.
  2. Whenever I run the function, I always get the warning Warning: no CRS informed, assuming EPSG:4326. However, there is no way I can see to actually inform the CRS, as the argument crs does not seem to exist. It does exist in sits_get_data(), so perhaps it should also be included in sits_cube(). (It is not clear from the documentation whether the one in sits_get_data() supports WKT, by the way.)
  3. The roi seems to work as a spatial filter for selecting tiles, i.e. sits_cube_copy() will download the entire tile, rather than what is in the ROI. For cases with a small ROI, that's a huge waste of network bandwidth and disk space. Considering that it's implemented via vsicurl, it should be possible to only download a particular area, such as via a VRT.

My use case is to download patches of Sentinel-2 imagery for deep learning, these are merely 300x300 or 100x100 pixels in size, scattered across the world. Having to download and store entire tiles just to discard the majority of the images is not particularly efficient, considering that COGs can easily serve only the requested areas.

Here's a MWE showing all these issues. Using roi = as.vector(e) results in getting tiles for the whole world.

library(terra)
#> terra 1.7.78
library(sits)
#> SITS - satellite image time series analysis.
#> Loaded sits v1.5.1.
#>         See ?sits for help, citation("sits") for use in publication.
#>         Documentation avaliable in https://e-sensing.github.io/sitsbook/.

f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
rl <- project(r, "EPSG:2169")
e <- ext(rl)

sc <- sits_cube("MPC", "MOD13Q1-6.1", bands="RED",
                roi = e, crs = crs(rl),
                start_date = "2023-07-01", end_date = "2023-07-15")
#> Error in .roi_type(roi): invalid roi parameter

Created on 2024-08-29 with reprex v2.1.1

gilbertocamara commented 2 weeks ago

Dear @GreatEmerald thanks for the contribution! Let's take your points one by one.

1. Add SpatExtent objects from terra.

We will implement your suggestion, and include the additional crs parameter. Meanwhile, please consider using the alternative with stars and sf as shown in the MWE below.

2. Remove warning: no CRS informed, assuming EPSG:4326 Agreed; this message will be removed.

3. Does crs in sits_get_data() support WKT? Not yet; will be fixed ASAP.

4. sits_cube_copy() will download the entire tile, rather than what is in the ROI.

That's not correct. Please run the following MWE

library(terra)
#> terra 1.7.78
library(sits)
#> SITS - satellite image time series analysis.
#> Loaded sits v1.5.1.
#>         See ?sits for help, citation("sits") for use in publication.
#>         Documentation avaliable in https://e-sensing.github.io/sitsbook/.

f <- system.file("ex/elev.tif", package="terra")
# read the TIFF file with stars
st <- stars::read_stars(f)
# get the bbox
bbox <- sf::st_bbox(st)
# rename the bbox 
names(bbox) <- c("lon_min", "lat_min", "lon_max", "lat_max")

# select the cube
sc <- sits_cube("MPC", "MOD13Q1-6.1", bands="RED",
                roi = bbox,
                start_date = "2023-07-01", end_date = "2023-07-15")
# copy part of the cube using roi 
sc_copy <- sits_cube_copy(
    sc,
    roi = bbox,
    output_dir = tempdir()
)
GreatEmerald commented 2 weeks ago

Aha, you're right, passing the roi to sits_cube_copy() indeed works! The reason I thought it didn't is because apparently the function fails to parse vectors in the format of "xmin", "xmax", "ymin", "ymax", despite what it says in the documentation, as it gives Error: .check_empty_data_frame: no intersection between roi and cube. Using the lon_min, lat_min, lon_max and lat_max names indeed works as expected.