bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
54 stars 5 forks source link

Strange things are happening with custom CRS projections in some corners of the world #116

Closed bodkan closed 1 year ago

bodkan commented 1 year ago

I've been playing with spatial models in other corners of the world than just Eurasia, also exploring different area-specific coordinate reference systems. Discovered weird bugs with plotting code which probably indicate something is off with the internal sf-based code which handles transformations of spatial geometric objects.

Consider zooming in on South America here using slendr's world function:

> map <- world(xrange = c(-90, -20), yrange = c(-58, 15))

OGR data source with driver: ESRI Shapefile 
Source: "/private/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T/RtmpLmReLu/naturalearth", layer: "ne_110m_land"
with 127 features
It has 3 fields

Warning message:
No explicit coordinate reference system (CRS) was specified.
A default geographic CRS (EPSG:4326) will be used but please make
sure this is appropriate for your geographic region of interest.

We don't specify a coordinate reference system, and get a warning as expected, but otherwise there's no problem, and we can plot the map with slendr as such:

plot_map(map)
image

However, when a customized projection system is specified, the plotted map is weird:

map <- world(xrange = c(-90, -20), yrange = c(-58, 15), crs = "EPSG:31970")
plot_map(map)
image

What's the grey bar on the right? Is that part of the map object? Plotting artifact?

Now, I picked the CRS EPSG:31970 simply googling "best CRS for South America", so I'm not claiming this is the best CRS projection available... but the map is clearly off.

Serveral things could be wrong here:

  1. slendr's internal sf-based cropping/zooming/transformation
  2. building and zooming of the map works, but it's the slendr's plotting that's broken
  3. ?
bodkan commented 1 year ago

Well, hello. When I plot the map using the raw, base-R plot function and not the fancy slendr plot_map, it clearly shows that the rectangle on the plot above is actually part of the map object. An artificial rectangle which is not supposed to be there. Continuing with the map object from the previous code chunk based on the current slendr version on GitHub:

> map
slendr 'map' object 
------------------- 
map: internal coordinate reference system EPSG 31970 
spatial limits (in degrees longitude and latitude):
  - vertical -90 ... -20
  - horizontal -58 ... 15
> plot(map$geometry)
image
> map$geometry
Geometry set for 15 features 
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 177349 ymin: -8467902 xmax: 9552377 ymax: 3849131
Projected CRS: SIRGAS 2000 / UTM zone 16N
First 5 geometries:
POLYGON ((192654.8 -8070667, 177349 -8068185, 1...
POLYGON ((1154872 -8024927, 1135576 -8064889, 1...
POLYGON ((207147.9 -8192244, 245393 -8171161, 2...
POLYGON ((1822678 -6230112, 1902324 -6285202, 1...
POLYGON ((2504480 -6127736, 2468216 -6156441, 2...
> 

The culprit is the 15th geometry in the set of polygons forming the cropped/zoomed South American map:

> map$geometry[15]
Geometry set for 1 feature 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 7006138 ymin: -8467902 xmax: 9552377 ymax: 3849131
Projected CRS: SIRGAS 2000 / UTM zone 16N
POLYGON ((7006138 3849131, 9552377 3849131, 955...
> plot(map$geometry[15])                                                                                        
image

It's extremely hard for me to track how this happens in the sf-transformation internals. But it's clear things get broken during the zooming/cropping and not the plotting afterwards.

bodkan commented 1 year ago

Interesting. Looking into the guts of slendr which deal with spatial coordinate transformations (code that's been written before slendr was even slendr 😬), I see that it first:

  1. Generates the whole map of the world [link]
  2. Converts the map into the user-provided CRS projection [link]
  3. Creates a cropping bounding box based on the WGS-84 lat/lon coordinate ranges (i.e. a rectangle to which it zooms the map into) [link]
  4. Transforms the bounding box into the requested CRS [link]
  5. Crops the CRS-transformed map based on the CRS-transformed bounding box [link]

This has worked OK since basically forever. But it doesn't seem to work here.

Now, when I extract the code from the slendr guts and flip the 1-2-3-4-5 order into 1-3-5-2 (skipping 4 completely), this gets rid of the artifact above:

library(slendr)
library(rnaturalearth)

target_crs <- "EPSG:31970"

# (1)
# prepare the map files on disk and load them as a spatial sf object
ne_dir <- file.path(tempdir(), "naturalearth"); scale <- "small"
utils::unzip(system.file("naturalearth/ne_110m_land.zip", package = "slendr"), exdir = ne_dir)
map_raw <- rnaturalearth::ne_load(
  scale = scale, type = "land", category = "physical",
  returnclass = "sf", destdir = ne_dir
)
sf::st_agr(map_raw) <- "constant"

# (3)
# define the zoom bounding box
xrange <- c(-90, -20)
yrange <- c(-58, 15)
zoom_bounds <- slendr:::define_zoom(xrange, yrange, "EPSG:4326")

# (5)
# crop the map to the defined bounding box
map_cropped <- sf::st_crop(sf::st_make_valid(map_raw), zoom_bounds)

# (2)
# transform the map into the new CRS
map_transf <- sf::st_transform(map_cropped, crs = "EPSG:31970") %>% sf::st_make_valid()

# add slendr-specific metadata
class(map_transf) <- slendr:::set_class(map_transf, "map")
attr(map_transf, "xrange") <- xrange
attr(map_transf, "yrange") <- yrange

# success?
plot_map(map_transf)
image

It's a little hard for me to understand all the geographic edge cases involved in these transformations, but it's interesting that avoiding the redundant CRS transformation of the bounding box (item # 4 in the list in this post) not only removes the artifact rectangle but this also produces only those spatial features from the very first figure in this issue (the one based on standard WGS-84 lat/lon coordinate map). We get only four spatial polygons instead of 15 above (which include Central America, Antarctica, etc...).

> map$geometry
Geometry set for 4 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 175428.6 ymin: -6330189 xmax: 7247569 ymax: 1756876
Projected CRS: SIRGAS 2000 / UTM zone 16N
POLYGON ((1748801 -6049351, 1736945 -5991773, 1...
POLYGON ((2413171 -6082500, 2366832 -6032383, 2...
POLYGON ((3449952 1333507, 3424219 1335445, 335...
POLYGON ((175428.6 1512472, 195560.6 1496466, 2...

Which is, in a sense, exactly what this should be doing?

bodkan commented 1 year ago

Interestingly, while fixing the issue for the map above, this changes (but doesn't break) the default West Eurasian map used in the tutorial.

map <- world(
  xrange = c(-15, 60),  # min-max longitude
  yrange = c(20, 65),   # min-max latitude
  crs = "EPSG:3035"     # coordinate reference system (CRS) for West Eurasia
)

plot_map(map)

Here is the original version:

image

And here is the fixed version which first crops the standard WGS-84 map to the rectangular bounding box, and only then transforms the cropped map to the user-provided CRS:

image

Again, note that the fixed version on the bottom does, indeed, crop at xrange = 20-65 degrees latitude and yrange = 20-65 degrees longitude, unlike the original map which used to shows features which are clearly outside of the desired cropping box specified by xrange and yrange.

It looks quite a bit weirder, but correctness beats aesthetics here I guess. This is really just about looks, the zoomed-in features of interest are the same in both maps.


In any case, for anyone reading this at a later point, note that it is possible to provide any set of spatial coordinates to the world function via its landscape = argument:

# reusing code above to get a Natural Earth map again (could be any other source
# of data in the sf format)
ne_dir <- file.path(tempdir(), "naturalearth"); scale <- "small"
utils::unzip(system.file("naturalearth/ne_110m_land.zip", package = "slendr"), exdir = ne_dir)
map_raw <- rnaturalearth::ne_load(
  scale = scale, type = "land", category = "physical",
  returnclass = "sf", destdir = ne_dir
)
sf::st_agr(map_raw) <- "constant"

# define a circular zoom regiong
zoom_bounds <- sf::st_point(c(-70, 10)) %>%
  sf::st_sfc(crs = 4326) %>%
  sf::st_buffer(dist = units::set_units(15, arc_degree))
map_cropped <- sf::st_crop(sf::st_make_valid(map_raw), zoom_bounds)
sf::st_agr(map_cropped) <- "constant"

# use the cropped customized map as a source of data for the world function
map_slendr <- world(xrange = c(-90, -50), yrange = c(26, -6),
                    landscape = map_cropped, crs = "epsg:31970")
plot_map(map_slendr)
image
petrelharp commented 1 year ago

Nice detective work. This all makes sense (I mean, I don't know where that weird box comes from, but I could imagine how weird things could have been happening. And I agree that correctness is better than prettyness... although, actually, I think another resolution could be to say that specifying lat/long bounds will get you "the map whose rectangle includes these lat/long bounds in the specified projection"? Since: people shouldn't really be relying on the lat/long bounds for the details of the model (just the populations should be within the bounds), and so these bounds are more for plotting? And showing the whole map is nicer?

bodkan commented 1 year ago

Hello Peter, thanks for your thoughts. I 100% agree with what you're saying. I'd prefer the whole map and I also think nobody will (should?) rely on the bounds of the map for the specifics of their model.

I'll try to find out where does the weird box come from and see if it's possibly a bug in sf or whether it's due to my misunderstanding of the cropping.

For future reference (also in case someone runs into a similar problem and prefers to do the zooming/cropping manually), this is a nice tutorial describing how to do this with sf. In fact, I think I used the same tutorial when I was writing the very first version of slendr's world() function. Here it is https://www.r-bloggers.com/2019/04/zooming-in-on-maps-with-sf-and-ggplot2/.

I will keep this open until I at least find out where does the weird box come from. If the issue really does lie somewhere inside sf (or, heaven forbid, in something at a lower level), I would probably favour correctness over prettyness, if only to protect non-Eurasian parts of the world from having to work around these weird artifacts.

petrelharp commented 1 year ago

One possibility of what the box might be is that it's the actual bounding box itself? (from the pre-transformation crop) Well, not the 'bounding box' itself, but a rectangle that gets added by that step.

bodkan commented 1 year ago

So, after more digging, I think the mystery has been solved. The above described behavior seems to be an artefact from the CRS transformation of the original world map before the actual cropping.

Observe the following, in which I reproduce the issue which originated this bugreport (zooming on South America):

First let's load a couple of packages and download some geographic data (same source as internally used by slendr in its world() function):

library(sf)
library(rnaturalearth)

ne_dir <- file.path(tempdir(), "naturalearth"); scale <- "small"
unzip(system.file("naturalearth/ne_110m_land.zip", package = "slendr"), exdir = ne_dir)
map_raw <- ne_load(scale = scale, type = "land", category = "physical", returnclass = "sf", destdir = ne_dir)

Here is the whole map:

plot(map_raw$geometry)
image

Now let's define a cropping box (same as what started this Github issue in the first place):

xrange <- c(-90, -30)
yrange <- c(-58, 15)
zoom_bounds <- st_as_sf(st_sfc(st_polygon(list(matrix(c(
  xrange[1], yrange[1],
  xrange[2], yrange[1],
  xrange[2], yrange[2],
  xrange[1], yrange[2],
  xrange[1], yrange[1]
), byrow = TRUE, ncol = 2)))), crs = 4326)

Plotting the cropping box on the original map we see:

plot(map_raw$geometry)
plot(zoom_bounds, add = TRUE, border = "red")
legend("topright", legend = "Zooming box", fill = "red")
image

So far so good! Let's now transform the map into the desired projected CRS (again, the same thing slendr's world() function does now, before the actual zooming/cropping):

map_transf <- st_transform(map_raw, crs = 31970)
plot(map_transf$geometry)
image

Yikes! What is that weird shape that appeared "west" of South America? That is Africa. :(

image

So let's get back to where we originally started -- for some reason, slendr's world() function generated a rectangular artefact uppon cropping a map to South America with a given CRS projection. Like this:

zoom_transf <- st_transform(zoom_bounds, crs = 31970)
map_cropped <- st_crop(st_make_valid(map_transf), zoom_transf)
plot(map_cropped$geometry)
image

So where does this weird rectangle come from?

Let's first overlay the transformed zoom box over the entire transformed map:

plot(map_transf$geometry)
plot(zoom_transf, add = TRUE, border = "red")
legend("topright", legend = "Zooming box", fill = "red")
image

Notice above how the transformed zoom box intersects part of the malformed Africa:

Indeed, when we plot the transformed zooming box on the zoomed-in and transformed South American region of the map, we see that the rectangle comes from the malformed African continent upon CRS transformation, which is caught by a small corner of the zoom box -- this makes a part of that "Africa" to be represented by a vertical rectangle running on the right border of the zoom region. In other words, the fact that the transformed red zooming box catches a tiny bit of the malformed African continent after CRS transformation is interpreted as if that "Africa" should be visible within the zoomed-in window. Which is exactly what's happening.

plot(map_cropped$geometry, border = TRUE)
plot(zoom_transf, add = TRUE, border = "red")
legend("topright", legend = "Zooming box", fill = "red")
image

... and this is how the weird rectangle appeared on our cropped map.

bodkan commented 1 year ago

So, to summarise the cause of the problem:

Projecting a part of a world map into a CRS which is appropriate one part of the world (in fact, it works best for that part of the world - South America here) can completely mess up the geometry in a very different part of the world (Africa in this case). If a CRS-transformed rectangular zooming region is used for cropping, it can easily catch an unwanted feature of a CRS-transformed map from a corner of the world where such CRS is nonsensical.

Nothing can be done with that. In fact, there is nothing wrong with this behavior. Cropping with a polygon specified with lat/lon coordinates can only make sense on a map which uses lat/lon coordinates! Projected CRS assumes a 2D map where lat/lon degrees no longer make sense.

The solution to this is already presented above. The map needs to be first cropped to the zooming box specified in lat/lon coordinates, and only what remains after this cropping can be transformed to the desired CRS. I will open a PR to make the small change required to make this work.

Note that this causes some aesthetic issues (like this), but this is really only about prettiness, not function. And, as Peter mentioned above, models should not rely on the exact spatial extent of a map anyway.

bhaller commented 1 year ago

I would just like to say that this is possibly the most aesthetic bug report ever. :-> Very interesting detective work! I feel like the figures above are so interesting and revealing of the underlying process that it will be a sad day when this issue is closed and they are never again seen by humanity. :->

bodkan commented 1 year ago

Haha, I'm glad you enjoyed reading this! This was a bit frustrating to figure out (these geo-related things are so complex for someone without any background in the field, like myself) but also fun because I've learned a lot. Or rather, I now have a much better understanding about how much I don't know. :)

You're right that it would be a bit of a shame to simply file this away. 🤔

I think a smaller vignette dedicated to CRS transformations, and preparing customized geographic landscapes as a simulation background might be useful. I've been trying hard to avoid this so far because there are entire books written on the topic and there are so many edge-cases involved in each step of the process described above... It's quite impossible to capture all of that in a satisfying way.

That said, there are situations where our world() function won't be sufficient (custom maps from sources other than the Natural Earth project being one example) and I have been thinking about adding a general import_map() function which would take in manually prepared data sets (basically what I've done in the long message above) and format them for simulation in slendr (adding bunch of simulation-related metadata, mostly).

The code and figures above could be a good basis for this new vignette and for demonstrating that future import_map() function.