benjann / geoplot

Stata module to draw maps
MIT License
28 stars 3 forks source link

legend cuts #6

Closed asjadnaqvi closed 1 year ago

asjadnaqvi commented 1 year ago

One good thing in the spmap package was that it allowed for a large set of options for cutting the legend. These included quantile boxplot eqint stdev kmeans custom unique. The two common ones with geoplot are custom and eqint. The one that shows the best colors is the quantile option, which is also the default in most GIS softwares.

Below are comparisons for all three

colorpalette CET L20, n(10) nograph reverse 
local colors `r(p)'

spmap y_MEDAGEPOP using nuts3_shp, id(_ID) ///
    legend(pos(1)) legstyle(2) fcolor("`colors'") cln(10)   ///
    ocolor(gs10 ..) osize(0.03 ..) ///
    ndocolor(gs10 ..) ndsize(0.03 ..) ndfcolor(gs8)  ///
    polygon(data(nuts_shp_lvl) select(keep if level>1) by(level) ocolor(gs4 black) osize(0.05 0.1) legenda(on) legt({it:NUTS boundaries})  ) ///
    title("spmap - quantile (default)")

colorpalette CET L20, n(10) nograph  reverse
local colors `r(p)'

spmap y_MEDAGEPOP using nuts3_shp, id(_ID) ///
    legend(pos(1)) legstyle(2) fcolor("`colors'") cln(10) clm(eqint)    ///
    ocolor(gs10 ..) osize(0.03 ..) ///
    ndocolor(gs10 ..) ndsize(0.03 ..) ndfcolor(gs8)  ///
    polygon(data(nuts_shp_lvl) select(keep if level>1) by(level) ocolor(gs4 black) osize(0.05 0.1) legenda(on) legt({it:NUTS boundaries})  ) ///
    title("spmap - eqint (equal interval)")

geoplot ///
    (area nuts3_shp y_MEDAGEPOP, level(10) colors(CET L20, reverse) )  ///
    (line nuts1_shp, lc(white) lw(0.03) ) ///
    (line nuts0_shp, lc(black) lw(0.2)) ///
    , ///
    legend(pos(2)) ///
    title("geoplot - equal interval (default)")

Here we can see that the quantile cuts really make it easy to see the variations.

spmap1 spmap2 geoplot_test_legend

benjann commented 1 year ago

The latest update now introduces suboption quantile (abbreviation q) to levels(). So you can type levels(10, quantile). Give it a try. The option is not documented yet. Other rules could be added without much trouble.

One question: I now count each unit once (i.e. equal weight) when taking quantiles, assuming this is how spmap does it. May there be situations in which it would make sense to weight the units (e.g. by the relative size of the area of the unit)? In this case I would need to add a further option to specify the relevant weights...

asjadnaqvi commented 1 year ago

Works! I think the next most used option in maps are kmeans, standard deviations, and jenks (not implemented in spmap but a popular option in ArcGIS and QGIS. It provides an intermediate solution between equal intervals and quantiles). Weighting is hardly used so this can be added but probably it is not needed.

geoplot_test3

benjann commented 1 year ago

kmeans, standard deviations, and jenks

do you have a convenient source for these methods you could share?

asjadnaqvi commented 1 year ago

kmeans and stdev are both already implemented in spmap so they can be ported over. jenks is a more intensive clustering algorithm and Nick Cox has a program group1d that comes close to it.

For a more exact implementation, probably one has to look at Python or R source codes.

benjann commented 1 year ago

Ah, I ment a description like on Wikipedia or so. But recycling code from others is fine too :)

Conceptually, if using a perceptually uniform color scheme (such as viridis), one could argue that equidistant intervals should be used because otherwise the correspondence between color perception and the underlying quantity is broken.

asjadnaqvi commented 1 year ago

I undertand the above sentiment but it rarely works for maps since it requires the data to be fairly uniformly spaced (which it is in the example I posted). Data tends to be skewed or clustered within certain ranges just one weird outlier can mess up the equal intervals. Also maps visuals one sees online do often distort the bins to maybe highlight a certain aspect of a story. Additionally, in my bimap package I do allow the users to display the bins with actual spacing. This can also be a neat option!

benjann commented 1 year ago

I now added method kmeans to levels(). For method quantile it is now also possible to specify a variable containing weights.

asjadnaqvi commented 1 year ago

I would close this issue for now since the basic set of cut-offs are now in place.