benjann / geoplot

Stata module to draw maps
MIT License
28 stars 3 forks source link

suggestions for `scatter()` options #8

Closed asjadnaqvi closed 1 year ago

asjadnaqvi commented 1 year ago

Several suggestions to align scatter() with standard usage:

The graph is produced using the following code:

geoplot ///
    (area nuts3_shp y_MEDAGEPOP if CNTR_CODE=="FR", level(8))  ///
    (line roads_shp if layer!=., colvar(layer) colors(tab Blue-Teal) lw(0.1 0.3 1) ) ///
    (line nuts1_shp if CNTR_CODE=="FR", lc(black) lw(0.1) ) ///
    (scatter places_shp [aw = population] if population>100000, colors(tab Orange)  mc(%50) msize(2)) ///
    , legend(pos(2))

PS. The line() options are looking great! Have also tested it extensively and have not found any obvious issues.

geoplot_test4

benjann commented 1 year ago

scatter is now called point (scatter is still available as an alias).

benjann commented 1 year ago

your comment about colors(): If colvar() is not specified, option colors() is interpreted as the regular color() option of Stata graphs. In this case you cannot used a palette specification, just a single color....

benjann commented 1 year ago

If you want to scale the points by population and at the same time vary the color by population, you need to specify ... [aw = population], colvar(population) ....

benjann commented 1 year ago

I see that you tried to use different line widths for the roads (lw(0.1 0.3 1)) in connection with colvar(layer); this will not work. In principle, something like this could be implemented, but currently I do not parse the regular graph options, they are just passed through. Don't know whether I want to get myself into parsing all these options. If you want the roads in different colors and line widths you can just add multiple layers, each plotting one type of roads.

benjann commented 1 year ago

I reopen this because having more control over the sizes of markers would be great. Don't know yet how this could be implemented. Stata graphs do not provide any options to control how the weights are translated into sizes, but possibly such control could be exercised by transforming the weights. For this, however, precise knowledge of the used algorithm is be needed (and maybe it's not just an issue of the min and max, but also other features of the distribution of the weights). I'll see whether I can find the algorithm somewhere Stata's graph ados.

An alternative approach would be to implement sizes similar to how geoplot implements colors, i.e. by overlaying multiple plots (one for each color/size). However, this would require categorizing the weights. A question would also be how to combine this with colors... Uhh, this reminds me that combining weights and colvar() will not work as expected because Stata will determine sizes within each color-group separately, not across all groups. I should find a way to fix this.

asjadnaqvi commented 1 year ago

How I have done it for some of my packages is to either, (a) group variables into ranks based on some ranges and assign a line width for each rank, (b) derive the weights and loop of each weight to assign a weight (a highly inefficient option but works in specific cases). The treemap implements both with line width lists for each level, and label scaling based on size of the box.

For (a) a custom legend needs to be generated but it allows for a lot of control. Btw spmap also lets users specify starting and end values to control the marker scaling. The implementation can be improved though.

I have also requested Stata afew time to allow users to defined line weights (like marker weights) through variables which would a great feature to have.

Related to this, each layer should have a legend option. I don't know how this would work with frames though...

benjann commented 1 year ago

I'll think about how to implement an option for scaling and line widths; one possibility would be to rename colvar() so that it is has nothing to do with color and is just a general variable for categorizing. Then lists levels()/cuts()/discrete will generate the groups and the user can select what should be varied, by specifying lists in cololors(), lwidth(), msize() etc. The only restriction of this approach is that the same categorization will be used for all features. But maybe this is reasonable; crossing different categorizations will potentially lead to a very large number of plots to be overlayed.

As for the legend, it is on my list to generalize the layer() suboption in legend() so that multiple layers can be included. There will also be a possibility to specify subtitles or create multiple legend columns.

benjann commented 1 year ago

I now solved the issue with weighted marker sizes not being comparable across colvar() subplots within a layer. In fact, I now changed the code that the weighted marker sizes will also be comparable across all layers of the graph. In some situations users might not want that, but then they can just modify the scale of weights in the data used for the different layers, so that the relation between marker sizes across layers is how they want it to me (conscious choice rather than obscure automatism). Also not that you can always use the msize() option to change the overall scale of the markers within a layer.

I also added a global option wmax() that can be used to make the scaling of markers comparable across different graphs (provided the graphs have the same dimensions).

The trick was to include two additional observations with weights 0 and 1 (and coordinates set to missing) in the working data used for the graph, normalize all weights to [0,1] (or possible less than one if wmax() is specified), and include the two additional observations in all plots that use weights.

Note that I renamed colvar() to cvar() because the plan is to eventually also provide the possibility to make line widths or other attributes depending on in it ("c" could stand for color but also for "categorize" or so).

benjann commented 1 year ago

I now implemented support for line widths depending on cvar(). This seems to work well. You could, for example, type colvar(layer) color(tab Blue) lw(0.1(.1).5) or similar to vary both color and line width. To only vary line width just specify a single color, say color(gs6); to only vary color, omit lwidth() or specify a single line width.

I will add support for some further elements such as msize(), msymbol(), or lpattern(). I will provably also deactivate the viridis default for color() so that a single default color is used if color() is omitted.

asjadnaqvi commented 1 year ago

Working with points:

geoplot ///
    (area nuts3 if CNTR_CODE=="FR", )  ///
    (scatter places  [aw = population] , cvar(popcut)  msize(1 5 10) discrete color(tableau, opacity(50)) mlcolor(white) mlwidth(vvthin)) ///
    , legend(pos(2)) 

geoplot_test6

asjadnaqvi commented 1 year ago

and also working with lines:

geoplot ///
    (area nuts3  if CNTR_CODE=="FR")  ///
    (line roads if layer!=., cvar(layer) color(tab Blue-Teal) lw(0.05 0.1 0.2) ) ///
    (line nuts1 if CNTR_CODE=="FR", lc(black) lw(0.1) ) ///
    (point places [aw = population] if population>100000,  mc(orange%50) msize(2)) ///
    ,   legend(pos(2))

If multple line layers are specified, then the legend shows both lines. Is it possible to add a layer-specific nolegend option?

geoplot_test4

benjann commented 1 year ago

In the point example the points have different sizes because you specified weights; i did not implement cvar-support for msize() yet. In the second example, it seems you have 5 types roads, but you only specified 3 widths, so the widths are repeated. In the next update it will be possible to specify lwidth(a b) and then an appropriate number of values between a and b will automatically be generated.

asjadnaqvi commented 1 year ago

The road types are exactly 3 in this case: image

If I use the discrete option then it shows up fine:

geoplot ///
    (area nuts3  if CNTR_CODE=="FR", lw(0.2) lc(black))  ///
    (line roads if layer!=., cvar(layer) discrete color(tableau) lw(0.1 0.2 0.3) ) ///
    ,   legend(pos(2))

geoplot_test4_2

benjann commented 1 year ago

Ah yes, of course, discrete is needed in this case.

benjann commented 1 year ago

Note that in the latest update I renamed option cvar() to zvar() (make much more sense, so we have X, Y, and Z). Furthermore, zvar() now supports many more styling aspects such as line patterns, marker symbols, or marker sizes.

asjadnaqvi commented 1 year ago

if cvar() is specified, it gives this wierd error: image

Probably better to block its use completely.

benjann commented 1 year ago

This is due to the introduction of option color without argument, which caused color() to be renamed internally to color2(). If there is no zvar() the option is passed on to graph as is and I forgot that it needs to be renamed back to color(). I now fixed this (not online yet). However, note that in the current implementation, also the interpretation of color() changes depending on whether zvar() is there or not. With zvar(), color() has full support for palettes from colorpalette; without zvar() support is only for single color specifications (not only Stata colors, any non-palette color specification allowed by colorpalette, but no palettes and no palette options). This means that your example above would still produce error (something like "tableau is not a valid color" or so). Do you think that this change in interpretation is confusing? I could change things such that color() has full support for single colors as well as palettes in all cases, even if palettes are not really all that useful if there is no zvar() (as only a single color is needed if there is no zvar()).

benjann commented 1 year ago

The color2() error is now fixed.