Open dannyparsons opened 7 years ago
Here's my current understanding of how to do the sort of mapping we would like using ggplot2.
ggmap
is a nice package but not what we need at the moment. This is for pulling down maps from e.g. Google Maps. It requires internet access and is most useful for doing detailed maps like street level maps.
The maps
package has a definition of country boundaries which can be reshaped ready for ggplot2
using ggplot2::map_data
e.g.
world <- ggplot2::map_data("world")
head(world)
long lat group order region subregion
1 -69.89912 12.45200 1 1 Aruba <NA>
2 -69.89571 12.42300 1 2 Aruba <NA>
3 -69.94219 12.43853 1 3 Aruba <NA>
4 -70.00415 12.50049 1 4 Aruba <NA>
5 -70.06612 12.54697 1 5 Aruba <NA>
6 -70.05088 12.59707 1 6 Aruba <NA>
Then this can be plotted by:
ggplot(world, aes(x = long, y = lat, group = group)) + geom_polygon()
If you then have your own data frame with long and lat values for points of interest this is just adding a geom_point
layer on top.
A heat map is more difficult, because this is done by adding fill
to the polygon layer, which means merging with the world
data.
So take the country counts data above
country_counts <- rio::import("C:/Users/Danny/Downloads/country_counts.csv")
head(country_counts)
country count
1 Algeria 40
2 Angola 3
3 Benin 9
4 Botswana 2
5 Burkina Faso 24
6 Burundi 13
I can do a right_join
to merge the counts and drop all the countries without a count:
mer <- right_join(world, country_counts, by = c(region = "country"))
long lat group order region subregion count
1 8.576563 36.93721 486 35622 Algeria <NA> 40
2 8.597656 36.88388 486 35623 Algeria <NA> 40
3 8.601269 36.83393 486 35624 Algeria <NA> 40
4 8.506739 36.78750 486 35625 Algeria <NA> 40
5 8.444238 36.76074 486 35626 Algeria <NA> 40
6 8.369629 36.63252 486 35627 Algeria <NA> 40
Then plot this:
ggplot(mer, aes(x = long, y = lat, group = group, fill = count)) + geom_polygon()
If I do a left_join I keep all the outlines of the other countries:
mer <- left_join(world, country_counts, by = c(region = "country"))
ggplot(mer, aes(x = long, y = lat, group = group, fill = count)) + geom_polygon()
There's a few problems with this:
world
data only has names, no ISO codes to match on. It could be very difficult to get the names consistent in your own data to match world
.world
data doesn't have categories like this so we can't easily subset for a continent.There are other datasets that could help with these issues.
rworldmap::countryRegions
lists countries and different region categories, which could be merged with world
to then subset for a continent. However, this isn't straightforward because the names in each don't match perfectly. There is another dataset rworldmap::countrySynonyms
which has different common names for countries which could be useful for merging.
To make these maps look nice we would want options like labels for the countries, adding capital cities with labels etc. Surprisingly, this isn't all straightforward in ggplot2.
There is maps::world.cities
which has details about cities which can easily be plotted. Again, the issue is subsetting because this may have to be merged to only plot cities in certain countries/regions.
Adding labels to polygons is easy, but getting the positioning right might not be. There are complicated methods of finding the centroid of the polygon etc. to give good positions.
Interestingly, outside ggplot2
there are nice solutions to these. rworldmap::joinCountryData2Map
can merge country data which uses rworldmap::countrySynonyms
but can also use ISO codes. maps::map.cities()
can add cities with labels to a plot and maps
also has a nice option of adding country labels and does this in nice positions.
My impression is that these things are not yet standard and easy in ggplot2
. I think it's still worth sticking with ggplot2
because of the advantages it has, but I was a bit surprised that there isn't yet standard and easy ways to do even simple maps.
Given that we only need a limited amount of mapping facilities for the procurement menu enhancements at this stage, I think the focus should be on delivering that, in a way that works, and not worry too much if it doesn't work completely generally at this stage. Although in the longer term, if we could do this well then I think we could be helping to fill a 'gap' by making some aspects of mapping in ggplot2
really easy.
Very useful start to this topic. I also agree with the conclusion, namely that we stick with ggplot and try to get something working. This may omit some important aspects for now, but they can come later. Also, if some work has to be done in matching names for the maps to work then that has to be done - by us if general, but often by the user, because their data will be involved, and that is where the names may be inconsistent.
Some more comments in no particular order.
geom_map
seems to be a wrapper for geom_polygon
, but I couldn't get a clear understanding of the different. I didn't see anyone having strong opinions about either so I didn't look further but it would be good to know.
That's a great link, and a good overview and it actually goes through the types of map that we need initially, useful for anyone who wants to know about ggplot mapping. I had seen these notes but not in this form, it looks very interesting and quite broad like the section on GitHub.
I wonder if the difference between the map and polygon might just be useful for us?
I notice that in the ggplot2 documentation for geom_map the examples section says: " When using geom_polygon, you will typically need two data frames: one contains the coordinates of each polygon (positions), and the other the values associated with each polygon (values). An id variable links the two together."
So he has just copied the example text from polygon for map. But when you look at the example itself I just wonder whether we might have a use for both? It is possible that the geom_map might save the merge. Or perhaps we could avoid doing the merging anyway, because of our linking? That seems to be the one difference, and it might be important for us. Hadley Wickham says geom_map is faster too.
This is very speculative, but let me also try to consider how a dialogue (or set of dialogues) might look: a) I suggest it is likely to become more than one dialogue and it is an important topic. So, Describe > Mapping (or Maps, or Spatial) is after Describe > Specific. b) I wonder if we have a Define Spatial Data dialogue. This might be not be needed if your data is climatic or procurement, because it is defined there. c) Then (usually?) there will be (at least) 2 data frames - and that is relatively new for us. One is "ours" with our data, that we want to map. And (usually) there is also a more general data frame with country boundaries etc. That define spatial dialogue could specify the lat and long columns (and others if needed) based on our data, and also specify the general data frame associated with our data. d) Then I wonder if it would be useful to have a relatively simple dialogue to sort out the base for the map. This uses the general data frame(s). It could be quite simple for now, but might later include more data frames from ggmap or other sources. This would not use our own data, except that it would include the name of our data frame. So it could produce a base map object for the general data frame and also for our data frame, because of the links. e) Then we have the dialogue to add our own data. Here we know the lat and long columns from our definitions. We could have a single receiver for polygons and perhaps another for points (maybe even a third for contours?) f) Then we have our (usual?) sub-dialogues, i.e. perhaps up to 4, namely Polygon Options, Point Options, Contour Options and an overall one for Options - as now! g) I am conscious I don't have labels!
We need to include some very simple mapping in the procurement menu. We need to be able to plot a value, such as the mean corruption risk index or more simply, the number of contracts, for a country or regions in a country.
Firstly, we just need to know how to do this in R, with ggplot, probably using the ggmap package, but I'm not sure if that's the only ggplot option. Attached is a sample dataset which is just a count of contracts for a set of countries which I would like to be displayed as a typical "heat" map, like in the image below:
So the first task is to produce a simple map to this using ggplot for the sample data. Once we know this R code we can then think about a dialog.