NIEHS / amadeus

https://niehs.github.io/amadeus/
Other
6 stars 2 forks source link

Accept polygons in `calc_*` functions #65

Closed mitchellmanware closed 3 months ago

mitchellmanware commented 5 months ago

Currently the calc_ functions only accept points (with optional buffer). Extracting/calculating for any polygon boundary is useful for census-boundary summary statistics

kyle-messier commented 5 months ago

@mitchellmanware This would be a fantastic enhancement that would be useful for lots of people. Are you thinking of trying to implement this for the manuscript or for a later devevelopment?

mitchellmanware commented 5 months ago

My goal was for the manuscript. I think this is a perfect example of why the modularization is great.

The function process_locs_vector is utilized in all of the calc_* functions to check the user-provided locations and to convert them to a SpatVector object for the extraction. Editing this function to accept SpatVector or sf objects with polygons and convert/check them in the same way would apply the polygon summary ability to all the calculation functions.

I will work on this and a few other functionality edits after the current draft is complete. @kyle-messier

sigmafelix commented 5 months ago

I think calc_covariates accept all types of planar geometries in locs. If one uses buffer radius=0, input polygons are used as they are (one bonus to buffering with zero radius is that it fixes topological errors). Exceptions are calc_tri and calc_nei.

If we want all calc_* functions to accept polygon inputs, some internal lines need to be revised whether terra::extract() will summarize values or extracted every single pixel value as it is. exactextractr::exact_extract() only accepts polygon inputs, which means some functions should handle exceptional cases with point inputs and zero buffer radius to pass to terra::extract().

mitchellmanware commented 5 months ago

@kyle-messier Example shows the flexibility of process_locs_vector after update by extracting "weasd" values at Connecticut county polygons and centroids, and at manually created point locations. I still need to adopt the different terra vs exact_extractr functions for point vs polygon inputs (as mentioned by @sigmafelix above), but this will allow for summary of values at census boundaries.

> n <- process_narr(
+   date = c("2018-01-01", "2018-01-03"),
+   variable = "weasd",
+   path = "narr/data/raw/weasd/"
+ )
Cleaning weasd data for January, 2018...
Detected monolevel data...
Returning daily weasd data from 2018-01-01 to 2018-01-03.
> s <- tigris::counties(state = "CT", year = 2018)
Using FIPS code '09' for state 'CT'
> s_v <- terra::vect(s)
> ## SpatVector (points) input
> calc_narr(
+   from = n,
+   locs = centroids(s_v),
+   locs_id = "GEOID"
+ )
Detected `SpatVector` extraction locations...
Calculating weasd covariates for 2018-01-01...
Calculating weasd covariates for 2018-01-02...
Calculating weasd covariates for 2018-01-03...
Returning extracted covariates.
   GEOID       time   weasd_0
1  09007 2018-01-01 0.4541016
2  09011 2018-01-01 0.0000000
3  09009 2018-01-01 0.1210938
4  09013 2018-01-01 1.1074219
5  09003 2018-01-01 3.4785156
6  09015 2018-01-01 0.0000000
7  09001 2018-01-01 0.0000000
8  09005 2018-01-01 1.0693359
9  09007 2018-01-02 1.1464844
10 09011 2018-01-02 1.1142578
11 09009 2018-01-02 1.6611328
12 09013 2018-01-02 1.7724609
13 09003 2018-01-02 3.6816406
14 09015 2018-01-02 1.2441406
15 09001 2018-01-02 1.1621094
16 09005 2018-01-02 2.7812500
17 09007 2018-01-03 0.8535156
18 09011 2018-01-03 0.3720703
19 09009 2018-01-03 1.3417969
20 09013 2018-01-03 1.3427734
21 09003 2018-01-03 3.7841797
22 09015 2018-01-03 0.3652344
23 09001 2018-01-03 1.3359375
24 09005 2018-01-03 2.3554688
> ## SpatVector (polygons) input
> calc_narr(
+   from = n,
+   locs = s_v,
+   locs_id = "GEOID",
+   fun = "mean"
+ )
Detected `SpatVector` extraction locations...
Calculating weasd covariates for 2018-01-01...
Calculating weasd covariates for 2018-01-02...
Calculating weasd covariates for 2018-01-03...
Returning extracted covariates.
   GEOID       time    weasd_0
1  09007 2018-01-01 0.45410156
2  09011 2018-01-01 0.24169922
3  09009 2018-01-01 0.06054688
4  09013 2018-01-01 2.30712891
5  09003 2018-01-01 2.25488281
6  09015 2018-01-01 0.00000000
7  09001 2018-01-01 0.04833984
8  09005 2018-01-01 2.27392578
9  09007 2018-01-02 1.14648438
10 09011 2018-01-02 1.12939453
11 09009 2018-01-02 0.83056641
12 09013 2018-01-02 2.46533203
13 09003 2018-01-02 3.23779297
14 09015 2018-01-02 1.24414062
15 09001 2018-01-02 0.85937500
16 09005 2018-01-02 3.23144531
17 09007 2018-01-03 0.85351562
18 09011 2018-01-03 0.62548828
19 09009 2018-01-03 0.67089844
20 09013 2018-01-03 2.56445312
21 09003 2018-01-03 3.07861328
22 09015 2018-01-03 0.36523438
23 09001 2018-01-03 1.03808594
24 09005 2018-01-03 3.06982422
> ## sf (polygons) input
> calc_narr(
+   from = n,
+   locs = st_as_sf(s_v),
+   locs_id = "GEOID",
+   fun = "mean"
+ )
Detected `sf` extraction locations...
Calculating weasd covariates for 2018-01-01...
Calculating weasd covariates for 2018-01-02...
Calculating weasd covariates for 2018-01-03...
Returning extracted covariates.
   GEOID       time    weasd_0
1  09007 2018-01-01 0.45410156
2  09011 2018-01-01 0.24169922
3  09009 2018-01-01 0.06054688
4  09013 2018-01-01 2.30712891
5  09003 2018-01-01 2.25488281
6  09015 2018-01-01 0.00000000
7  09001 2018-01-01 0.04833984
8  09005 2018-01-01 2.27392578
9  09007 2018-01-02 1.14648438
10 09011 2018-01-02 1.12939453
11 09009 2018-01-02 0.83056641
12 09013 2018-01-02 2.46533203
13 09003 2018-01-02 3.23779297
14 09015 2018-01-02 1.24414062
15 09001 2018-01-02 0.85937500
16 09005 2018-01-02 3.23144531
17 09007 2018-01-03 0.85351562
18 09011 2018-01-03 0.62548828
19 09009 2018-01-03 0.67089844
20 09013 2018-01-03 2.56445312
21 09003 2018-01-03 3.07861328
22 09015 2018-01-03 0.36523438
23 09001 2018-01-03 1.03808594
24 09005 2018-01-03 3.06982422
> ## data.frame (points) input
> l <- data.frame(lon = -78.8277, lat = 35.95013)
> l$site_id <- "3799900018810101"
> calc_narr(
+   from = n,
+   locs = l,
+   locs_id = "site_id"
+ )
Detected `data.frame` extraction locations...
Calculating weasd covariates for 2018-01-01...
Calculating weasd covariates for 2018-01-02...
Calculating weasd covariates for 2018-01-03...
Returning extracted covariates.
           site_id       time weasd_0
1 3799900018810101 2018-01-01       0
2 3799900018810101 2018-01-02       0
3 3799900018810101 2018-01-03       0
> 
mitchellmanware commented 5 months ago

@sigmafelix

Updates to apply exactextractr::exact_extract for polygon locations and terra::extract for point locations. I will provide a more detailed update in the PR description.

https://github.com/NIEHS/amadeus/blob/9830c507414e0532876411e2da0f5e11e7621879/R/calculate_covariates_auxiliary.R#L358

sigmafelix commented 5 months ago

@mitchellmanware Does it apply to all calc_* function? I found that many functions with radius = 0 as a default argument utilize terra::extract() for summarizing raster values.

mitchellmanware commented 5 months ago

@sigmafelix I only introduced the new functions and applied them to calc_narr in the 9830c507414e0532876411e2da0f5e11e7621879 commit for my own tracking of changes. I am working on applying them to all functions.

Also, now that process_locs_vector can accept terra, sf, and data.frames for both polygons and points, I will work to apply this function all calculation functions as well.