GlobalFishingWatch / gfwr

R package for accessing data from Global Fishing Watch APIs
https://globalfishingwatch.github.io/gfwr/
Apache License 2.0
58 stars 7 forks source link

`get_raster()` returns data that doesn't cover full extent of user `sf` object #164

Closed jflowernet closed 1 month ago

jflowernet commented 1 month ago

When using get_raster() with a user supplied sf region, the data returned does not fully cover the extent of the sf object. Reprex below illustrates the issue; you can see that the ymin and ymax of test_shape are both higher than the test_data, and xmin of the test_shape is lower than the test_data, so some data is being cut-off. My guess is that the extent coordinates get rounded to the grid that GFW data uses, but rather than rounding min values down and max values up, it is rounding to the nearest grid cell, which might be higher or lower.

Thanks for all the updates to gfwr - I appreciate the GFW team's hard work!

library(gfwr)
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE

#data and query from ?get_raster
data(test_shape)
test_data <- get_raster(spatial_resolution = 'HIGH',
           temporal_resolution = 'YEARLY',
           start_date = '2021-01-01',
           end_date = '2021-10-01',
           region = test_shape,
           region_source = 'USER_JSON')
#> Rows: 58742 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#returned data does not cover full extent of test_shape: compare min and max
st_bbox(test_shape)
#>     xmin     ymin     xmax     ymax 
#> 56.74815  0.00000 70.00000 21.79799
summary(test_data[1:2])
#>       Lat             Lon       
#>  Min.   : 0.00   Min.   :56.85  
#>  1st Qu.:11.07   1st Qu.:62.80  
#>  Median :14.86   Median :63.81  
#>  Mean   :12.57   Mean   :64.15  
#>  3rd Qu.:16.06   3rd Qu.:65.34  
#>  Max.   :21.25   Max.   :69.55

Created on 2024-07-18 with reprex v2.1.1

AndreaSanchezTapia commented 1 month ago

Hi @jflowernet, thanks for reaching out about this. The API returns the fishing effort data available that is contained within the provided shapefile, and does not perform a filling of the rest of the shapefile surface (it does not rasterize the data in the sense of assigning a value to every cell in a raster corresponding to the shapefile). For example for the data you provided, the data retrieved looks like this: image

And therefore there is no expectation that the results and the shapefile provided have exactly the same bounding box.

Please let me know if this is useful or if you have any other questions about this and don't hesitate to come back with feedback for the package.

jflowernet commented 1 month ago

Ahh, yes of course! I was thinking that it is retrieving cells within the bounding box, but actually it is cell centroids. Many thanks for the explanation @AndreaSanchezTapia - keep up the good work :smile: