Aggregating gridded data returns multiple results per country

MariusBug commented 3 years ago

Hello everyone,

I am trying to aggregate fine-graned global climate grid data to multipolygons (first countries, later on the subnational level).

To accomplish this, I now have two shapefiles at hand, one containing the disaggregated climate data, the other the country geometries.

A glimpse at the climate data:

> head(mst_1950_sf) 
Simple feature collection with 6 features and 2 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: -179.75 ymin: 66.75 xmax: -179.75 ymax: 71.25
CRS:            +proj=longlat +ellps=WGS84 +no_defs
A tibble: 6 x 3
   year moisture        geometry
  <dbl>    <dbl>     <POINT [°]>
1  1950    0.689 (-179.75 71.25)
2  1950    0.752 (-179.75 68.75)
3  1950    0.765 (-179.75 68.25)
4  1950    0.638 (-179.75 67.75)
5  1950    0.811 (-179.75 67.25)
6  1950    0.728 (-179.75 66.75)

And the second shapefile:

> world2015 %>% dplyr::select(ISONAME, geometry) %>% head()
Simple feature collection with 6 features and 1 field
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -175.36 ymin: -21.26806 xmax: -53.98612 ymax: 12.1975
CRS:            +proj=longlat +ellps=WGS84 +no_defs
              ISONAME                       geometry
0              Guyana MULTIPOLYGON (((-58.17262 6...
1            Suriname MULTIPOLYGON (((-55.12796 5...
2 Trinidad and Tobago MULTIPOLYGON (((-61.07945 1...
3           Venezuela MULTIPOLYGON (((-66.31029 1...
4               Samoa MULTIPOLYGON (((-172.5965 -...
5               Tonga MULTIPOLYGON (((-175.1453 -...

Using sp's aggregate, I wanted to take the mean moisture of each country which seemed to work as expected:

statesAg <- aggregate(mst_1950_sf["moisture"], by = world2015$geometry, mean)
plot(statesAg)

Looked good to me - but aggregating this way drops all other variables and I don't see a way to retain other variables of interest.

> names(statesAg)
[1] "moisture" "geometry"

Thus, I merged the new data with my previous dataset to get the other variables of interest back. When checking the results, however, I noted that for many countries several "means" had been returned.

> statesAg <- st_join(world2015, statesAg)
although coordinates are longitude/latitude, st_intersects assumes that they are planar
> statesAg %>% dplyr::select(ISONAME, moisture) %>% head(n=20)
Simple feature collection with 20 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -175.36 ymin: -55.05167 xmax: -53.65001 ymax: 12.1975
CRS:            +proj=longlat +ellps=WGS84 +no_defs
First 10 features:
                ISONAME     moisture                       geometry
0                Guyana  0.129646465 MULTIPOLYGON (((-58.17262 6...
0.1              Guyana  0.207329932 MULTIPOLYGON (((-58.17262 6...
0.2              Guyana  0.130700565 MULTIPOLYGON (((-58.17262 6...
0.3              Guyana -0.009078202 MULTIPOLYGON (((-58.17262 6...
1              Suriname  0.129646465 MULTIPOLYGON (((-55.12796 5...
1.1            Suriname  0.207329932 MULTIPOLYGON (((-55.12796 5...
1.2            Suriname -0.009078202 MULTIPOLYGON (((-55.12796 5...
2   Trinidad and Tobago  0.231250000 MULTIPOLYGON (((-61.07945 1...
3             Venezuela  0.129646465 MULTIPOLYGON (((-66.31029 1...
3.1           Venezuela  0.130700565 MULTIPOLYGON (((-66.31029 1...

What has happened here? Which mean is the actual mean of the respective countries, and which ones was the one plotted above? I would be very glad about any hint about what's been going on.

Thanks!

edzer commented 3 years ago

Are you confusing sp and sf here?

MariusBug commented 3 years ago

Yes, sorry! Looks like I mistook the syntax for sp's.

edzer / sp

Aggregating gridded data returns multiple results per country #104