StirlingCodingClub / studyGroup

Gather together a group to skill-share, co-work, and create community
http://StirlingCodingClub.github.io/studyGroup/
Other
2 stars 1 forks source link

Calculating extent of overlap of multiple polygons (SpatialPolygonsDataFrame) #30

Open jejoenje opened 5 years ago

jejoenje commented 5 years ago

Hi all

Could someone help me with some R-based spatial analyses? I figure I must be doing something really simple wrong but I can't seem to work this out easily...

So, what I want to do is to calculate the extent of overlap of one polygon (or set of polygons) with another set of polygons. More specifically, I have species distribution maps and I want to calculate how much of each country a species occurs in (i.e. a value of 1 would mean it occurs in all of a given country, 0.5 it would occur in 50% of a country etc). In the simplest case, a species would consist of only a single polygon overlapping multiple country polygons. However, in many cases a species distribution consists of multiple (non-contiguous) polygons.

The first answer on this Stackexchange thread essentially seems to neatly illustrate the sort of problem thing I'm talking about and essentially gives me what I would need (area of overlap).

However, I don't seem to be able replicate this example with my "real" data (just as an example - Indian Pangolin). Here is my code (code and data in https://github.com/jejoenje/sp_poly_overlap):

library(rgdal)
library(cshapes)
library(raster)

# Read the spatial distribution data. In this example, the distribution of Indian Pangolin (Manis crassicaudata).
# This is taken from a collection of Shapefiles of species distributions.
# This file is in the Github repo https://github.com/jejoenje/sp_poly_overlap 

mcras <- readOGR("mcras.shp")

# Plot the distribution:
plot(mcras, col = "red")

# Get basic country outlines from the cshapes package:
world <-cshp(as.Date('2016-01-01'))

# Make sure the projections match:
projection(mcras) <- projection(world)

# Crop the world country map to the extent of the species distribution (I think intersect() as used below does this
# automatically anyway, but for the sake of plotting)
mcras_base <- crop(world, extent(mcras))

# Plot distribution on top of base map.
plot(mcras_base)
plot(mcras, add=T, col="red")
# Note the occurrence across India, Bangladesh and Pakistan, but particularly in Sri Lanka.
# The latter is obviously a second, separate polygon (there are two separate entries in the data):
polygons(mcras)
mcras@data
# The base map has an entry for each country in the extent:
polygons(mcras_base)
mcras_base@data

# Now following the example on https://gis.stackexchange.com/questions/140504/extracting-intersection-areas-in-r
# use intersect() to intersect the polygons in mcras with those in mcras_base:
mcras_i <- intersect(mcras, mcras_base)
plot(mcras_i)
# We can now calculate the area in each country (arbitrary scalar for the sake of argument) :
mcras_i$area <- area(mcras_i)/1000000 
aggregate(area ~ CNTRY_NAME, data = mcras_i, sum)

# So this gives areas of the species distribution in each country, but it is obviously missing Sri Lanka entirely.
# Note how the resulting data only seems to have the intersects with the SECOND polygon in mcras, 
# i.e. only the contiguous area in India, Pakistan and Bangladesh. This tallies with the summary above.
mcras@data
mcras_i@data

# How do I make sure the same intersection is calculated for BOTH the polygon parts in mcras?
# I can plot them separately:

plot(mcras_base)
plot(mcras[1,], col="red", add=T)  # First line in mcras represents the distribution in Sri Lanka
plot(mcras[2,], col="blue", add=T) # Second line is the main distribution in India, Bangladesh and Pakistan.

# However, if I try to intersect the two species polygon parts separately, this works for the contiguous block:
mcras_i2 <- intersect(mcras[2,], mcras_base)
plot(mcras_i2)
# ... but NOT for the first block (Sri Lanka):
mcras_i1 <- intersect(mcras[1,], mcras_base)
plot(mcras_i1)
mcras_i1@data
mcras_i1@polygons

In the example I've based this off, it does seem to work with two polygons in one set, and multiple ones in the other (i.e. it doesn't only calculate overlaps for the second of the "field" polygons... and I can't figure out why? Is this something to do with my species distribution data format?

Any thoughts greatly appreciated!

bradduthie commented 5 years ago

I keep meaning to take a look at this more carefully @jejoenje -- will try to help soon here.

mattnuttall00 commented 5 years ago

Hi @jejoenje , apologies I don't know what it going on here, but based on the coding club today, Thiago is probably the best person to ask!