Ability to scale shapes based on plotted variable

asjadnaqvi commented 1 year ago

This feature would be great to have! This would be similar to what heatplot does, and also exists in spmap.

This would allow for really interesting visualizations by scaling the same shapes for different variables on top of each other.

benjann commented 1 year ago

yes, this is already on my list

benjann commented 1 year ago

Hi Asjad, I have a question about this. I see two possibilities:

(1) make shapes smaller or bigger using a factor proportional to the (normalized) values of a variable; a value of, say, 0.5 will make the size of the shape half as large as the original size; the reference, so to say, is the original size of the shape and a value of 1 leaves the size of the shape unchanged

(2) recompute the sizes of the shapes such that the resulting sizes are actually proportional to the (normalized) values of a variable, across all plotted shapes; a value of 0.5 would make a shape half as large as a shape with value 1, irrespective of the original size of the two shapes; that is, the reference is a situation in which all shapes have the same size; if the used variable is a constant, the plotted shapes will all have the same size

Variant (1) is very simple to implement (deduct centroids from coordinates, multiply by the square root of the scaling factor, add the centroids back in). Variant (2) can be implemented by first dividing the scaling factor by the original size of a shape and then proceed as in variant (1). A complication of variant (2) is, of course, that one has to know the original sizes of the shapes.

spmap uses variant 2; the centroids and the sizes (i.e. areas) are computed on the fly. My policy for geoplot, however, is to plot stuff only and not do complicated computations in the background. So I now implemented variant (1) (not online yet), but I am afraid that this is not what one would typically want. For variant (2) it would be the user's job to divide the variable by shape sizes before running geoplot. (E.g. scaling by population density would make shapes proportional to total population.)

In essence my question is: Do you see any value of providing variant (1) or should I better enforce variant (2) to prevent confusion and misinterpretation? With variant (1) I see the danger that users accidentally interpret results as if they were computed according to variant (2). My approach to implement variant (2) would be to throw an error message if no variable containing the sizes of the shapes is available in the frame. In geoframe there would be a possibility to declare the name of the relevant variable (with a default such as _SIZE or _AREA or _ASIZE or so). I could also provide a command geoframe generate size (or similar) to compute the areas of the shapes should not such variable be available in the original data. (Do you know whether information on the area is typically included in the attribute section of a shape file? My experience with shape files from Switzerland is that such information included, but I do not know whether this is standard.)

asjadnaqvi commented 1 year ago

Dear Ben,

Thanks for looking into this! Honest answer is that scaling is hardly used in maps and it is more of an indicative element to show additional variation. So complicated calculations are not necessary but if options are easy to program then people can play with them. It's the same logic as legend cutoffs which have no stringent rules.

Two possible uses I can see are the following: (A) A chorolopleth map is drawn by one variable, e.g. population, but shapes are scaled by another, e.g. population density (where highest density is normalized to 1). This is effectly a map showing two distributions in one go. Here the scaling of course would be dependent on where it is a simple factor reduction of coordinates or areas are proportional. Both are subjective decisions.

(B) Shapes are scaled across two layers baed on the global max across the two layers. For example proportion of people that voted democrats vs those who voted republicans. Here the proportions are important since share are out of 1 or 100. Here one can define a manual option such as max(100) to ensure normalization is comparable. One can also use the max() or comparable option to also blow up some shapes, e.g. max(50) would scale a value of 60 by 1.2x.

So I can see myself using this with both your option (1) and having the ability to define a normalizing variable in (B).

benjann commented 1 year ago

Thanks. I will likely implement both. Variant 1 as [weight], variant 2 a option size().

benjann commented 1 year ago

I now implemented both variants. Variant 1 as weights, variant 2 as option size(). To control the overall scale there are global options wmax() and dmax(). Give it a try.

asjadnaqvi commented 1 year ago

Thanks for adding this. Some points:

Issue 1: Something is off with the the cutoffs and placement of legend:

geoplot ///
    (area statehex olddepratio, lc(black) level(10) color(viridis, reverse) )  ///
    ,   legend(pos(5))

Here 10 levels and position 5 are not working:

geoplot_test6

Issue 2 The scaling is working well:

geoplot ///
    (area statehex , size(olddepratio)   lc(black) fcolor(yellow)  )  ///
    , legend(pos(4))

geoplot_test7

But if dmax() or wmax() is specified, the the command throws an error:

Screenshot 2023-06-01 003448

Issue 3

If two scales are specified, then each is scaled by the global maximum (i think):

geoplot ///
    (area statehex , size(youngdepratio) lc(black) fcolor(black%40) )  ///
    (area statehex , size(olddepratio)   lc(black) fcolor(yellow)  )  ///
    ,   legend(pos(4))

geoplot_test8

Here each layer should be treated individually based on the max of individual layer:

And the wmax() and/or dmax() should allow us to change the maximum of each layer. For example if I want olddepratio to be scaled by 10, rather than 8.8333, then I should just say max(10). This ensures that the shape containing the maximum value does not have a scale factor of 1 but rather 0.883. This also allows me to blow up the shapes, e.g. if I set the max to 5, then the shape with the highest value scales by 8.83/5 = 1.766. How much the other shapes scale down then depends on the algorithm used.

benjann commented 1 year ago

Ok, I see. You need more flexibility here. I will change things such that normalization is withing layer by default and that for each layer you can specify custom normalization. Currently dmax() and wmax() are global options. This is why you get an error in Issue 2. Possibly I will also change dmax() such that it is relative (or that you can use it in a relative or in an absolute way, e.g. dmax(*1.5) would make all shapes 50% larger).

Issue 1: It seems I broke the level() option. I'll fix this in the next update.

benjann commented 1 year ago

Issue 1 is now fixed.

benjann commented 1 year ago

I now made corresponding changes. Global dmax() and wmax() are discontinued and densities or weights will now be normalized separately within each layer based on the observed maximum density or maximum weight. Option size() now has suboptions scale() to multiply the sizes and dmax() to set the max density for normalization. Option wmax() is now available within layers to set the max weight per layer.

benjann commented 1 year ago

I believe these issues are solves; I close this...

benjann commented 1 year ago

"solved" I mean

benjann / geoplot

Ability to scale shapes based on plotted variable #10