D-PLACE / dplace2

clld app serving the D-PLACE database
Apache License 2.0
7 stars 2 forks source link

Tweaks to map views and 'two-variable' map views #12

Open kirbykat opened 4 years ago

kirbykat commented 4 years ago

Hi, just digging into reviews on a paper and wanted to explore some of the reviewer questions by quickly visualising combinations of variables on the D-PLACE site. As I did this I came up with a few tweaks that I think would improve the map view, which overall, is great - especially the ability to visualise two variables at once!

[To reproduce the search I did, got to the home page -> variables - > EA060. This produces a map view. Then add EA033 to the visualisation.]

  1. Is there a way to select colours for display (even when only examining a single variable) such that the contrast among them is maximized? E.g., could we use a pre-selected colour palette for variables with 2,3,4 etc. codes, with colours assigned so that the codes that are maximally distinguishable/differentiable when displayed on a map? Color brewer (https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) is a classic source for this type of palette (and you can copy the RGB or other codes for selected palettes, with options for both ordinal and categorical variables), but I’m sure there are other plug-and-play packages we might easily integrate here?

  2. When two variables are selected, is there a way to extend the previous palette so that the same colours aren’t selected for both variables, as happened here? One thing that could help for cases where there are large numbers of codes would be to first rank codes by the number of societies with each code, and to prioritise the most distinguisable colours for the most common codes.

  3. Could the drop-down map legends be tweaked so that both can be viewed at once? Currently, if one is selected, the other collapses.

  4. Advanced wish: could a summary matrix appear, showing the number of cases with each code (e.g., number of societies with “Junior Age” and “Acephalous”; “Junior Age and One level”; etc. to “Activity absent” and “Four levels”) - See example pasted into screenshot.

Screenshots-DPLACE

kirbykat commented 4 years ago

One additional suggestion as I think about it:

  1. Could the preliminary marker size be reduced slightly, especially for the two-variable map?
xrotwang commented 4 years ago

Regarding the color palette: We do use colors that are said to be maximally distinguishable, see the references in this chunk of code: https://github.com/clld/clldutils/blob/92ab4cab4f9a39e0d6f20f09ef75e0ea4a11d025/src/clldutils/color.py#L58

So any alternative would require a bit of theory behind it - rather than just personal preference.

Using different palettes when displaying multiple variables seems like a good idea - but would require quite a bit of work to be generic enough (e.g. to specify all the subcases (categorical, ordinal, ...) for up to 5 variables).

kirbykat commented 4 years ago

Thanks @xrotwang.

Here is my suggestion - I'm sure there is a more technically correct method (from a colour perspective) to do this, but I think the simplest implementation of what I suggest below would not be too hard to do, and would avoid the orange-next-to-light orange combination that Ied me to post this issue. Please let me know what you think...

For categorical variables:

Step 1. Rank codes in terms of the number of societies in the sample receiving the code. See invented example in screenshot below.

Step 2. Assign a colour to each code based on the order the colour is added to the pyramid in Fig. 20 at: https://personal.sron.nl/~pault/*, starting with the highest-frequency code. The reason for adding Step 1, and selecting colours in the order they are added to the pyramid, is to avoid the situation in the example I sent, where the two most common codes in EA060 were assigned the colours orange and light orange.

image

*I chose the palette in Fig. 20 because it contains up to 23 colours and because one of the author’s colour schemes is already implemented in D-PLACE (I followed a link in section of code you sent a link to – see line 90). The author claims the pyramid’s colours were selected to be “distinct, reasonably colour-blind safe and print-friendly”. However, it might make sense to use a different, fully-colour-blind safe palette for variables with <11 categories? [EDIT: See my next two comments below, in which I test out pyramid on colour blind simulator. Based on that I would say that IF a new palette is going to be implemented, it makes sense to just use this versatile pyramid]. If we use a certified colour blind-friendly palette, I still think it is worth using a ranking system to assign colours, so that the most commonly represented codes are most distinguishable. On the same site, just below Fig. 6, the author suggests an order in which colours should be assigned if they have to picked in a fixed sequence (he says they can be picked randomly, but I do not agree, as a normal-vision person, that red and magenta are "as distinguishable" as, say, red and blue...).

image

**Open question: would it be possible to give users the option to toggle something and display/hide societies coded as "NA" on the map? This would be especially helpful when viewing two variables at once (as some societies coded as "NA" for one variable would likely have a code for the other), but would also be more generally useful in terms of reminding users of the global sample for a given dataset. If this could be added without too much trouble, I think it would make sense to use a consistent colour for NA, like the grey at the end of Fig. 16 on the above page (RGB: 136, 136, 136). If creating this new feature would create more problems than it would solve, it can of course be ignored...

For sequential (ordinal) or continuous variables to be displayed on their own: use current method, or see (2) below.

When a categorical and sequential variable are to be displayed together (as in the example I tried that motivated this issue, where EA060 and EA033 were added to the same map) – two ideas.

(1) Always use a light grey to black scale for the continuous/ordinal variable, and use the method suggested above (colour pyramid) for the categorical variable? OR (2) Cut colours 18-25, and 27-28 out of the above pyramid (yellows-dark oranges), and always use a yellow-dark orange sequential palette for the ordinal variable, like the one in Fig. 16 on the same website (https://personal.sron.nl/~pault/). Note, I think dark red could stay [26] as it is quite distinct from the colours in Fig. 16.

image

Advantages of idea 1: easy to implement, if someone wants to show the categorical variable alone, and then layer on the continuous/ordinal variable, the colour scheme won’t change [BUT, maybe too much to worry about – the user can deal with the change in colour scheme when the second variable is added, but seeing both legends simultaneously will be important].

Disadvantages: Greys might not be distinguishable from some colours for colour-blind people and/or when printed. If someone adds to variables in the opposite order (they first view the sequential, then the categorical), the colours will change unless we always display sequential data on a grey scale.

When two categorical variables are to be displayed on the same map:

What about ranking the frequency of the codes across the two variables, and then following steps 1 and 2 above to assign colours across the variables. This would ensure the most contrasting colours were assigned to the most frequent codes across the variables? But, whatever we do, it would be great if there were a way to avoid same/very similar-colours being displayed side by side.

Another option would be to split the pyramid and assign purples – greens (say colours <17) to one variable, and colours > =18 to the other. Again, the most contrasting colours should be selected by adding colours in the order they are added to the pyramid...

xrotwang commented 4 years ago

Good job! I guess, I'll first try to implement these ideas in the generic colour package. If this works, changes to dplace code might be small.

kirbykat notifications@github.com schrieb am Fr., 10. Apr. 2020, 22:32:

Thanks @xrotwang https://github.com/xrotwang.

Here is my suggestion - I'm sure there is a more technically correct method (from a colour perspective) to do this, but I think the simplest implementation of what I suggest below would not be too hard to do, and would avoid the orange-next-to-light orange combination that Ied me to post this issue. Please let me know what you think...

For categorical variables:

Step 1. Rank codes in terms of the number of societies in the sample receiving the code. See invented example in screenshot below.

Step 2. Assign a colour to each code based on the order the colour is added to the pyramid in Fig. 20 at: https://personal.sron.nl/~pault/*, starting with the highest-ranked code. The reason for adding Step 1, and selecting colours in the order they are added to the pyramid, is to avoid the situation in the example I sent, where the two most common codes in EA060 were assigned the colours orange and light orange.

[image: image] https://user-images.githubusercontent.com/7913855/79020628-565c3780-7b47-11ea-9906-85a2539a8225.png

*I chose the palette in Fig. 20 because it contains up to 23 colours and because one of the author’s colour schemes is already implemented in D-PLACE (I followed a link in section of code you sent a link to – see line 90). The author claims the pyramid’s colours were selected to be “distinct, reasonably colour-blind safe and print-friendly”. However, it might make sense to use a different, fully-colour-blind safe palette for variables with <11 categories? If we do that, I still think it is worth using a ranking system to assign colours, so that the most commonly represented codes are most distinguishable. On the same site, just below Fig. 6, the author suggests an order in which colours should be assigned if they have to picked in a fixed sequence (he says they can be picked randomly, but I do not agree, as a normal-vision person, that red and magenta are "as distinguishable" as, say, red and blue...).

[image: image] https://user-images.githubusercontent.com/7913855/79020989-6294c480-7b48-11ea-9519-76b910a713c5.png

**Open question: would it be possible to give users the option to toggle something and display/hide societies coded as "NA" on the map? This would be especially helpful when viewing two variables at once (as some societies coded as "NA" for one variable would likely have a code for the other), but would also be more generally useful in terms of reminding users of the global sample for a given dataset. If this could be added without too much trouble, I think it would make sense to use a consistent colour for NA, like the grey at the end of Fig. 16 on the above page (RGB: 136, 136, 136). If creating this new feature would create more problems than it would solve, it can of course be ignored...

For sequential (ordinal) or continuous variables to be displayed on their own: use current method, or see (2) below.

When a categorical and sequential variable are to be displayed together (as in the example I tried that motivated this issue, where EA060 and EA033 were added to the same map) – two ideas.

(1) Always use a light grey to black scale for the continuous/ordinal variable, and use the method suggested above (colour pyramid) for the categorical variable? OR (2) Cut colours 18-25, and 27-28 out of the above pyramid (yellows-dark oranges), and always use a yellow-dark orange sequential palette for the ordinal variable, like the one in Fig. 16 on the same website ( https://personal.sron.nl/~pault/). Note, I think dark red could stay [26] as it is quite distinct from the colours in Fig. 16.

[image: image] https://user-images.githubusercontent.com/7913855/79020163-16488500-7b46-11ea-9bda-8f154d86f3fa.png

Advantages of idea 1: easy to implement, if someone wants to show the categorical variable alone, and then layer on the continuous/ordinal variable, the colour scheme won’t change [BUT, maybe too much to worry about – the user can deal with the change in colour scheme when the second variable is added, but seeing both legends simultaneously will be important].

Disadvantages: Greys might not be distinguishable from some colours for colour-blind people and/or when printed. If someone adds to variables in the opposite order (they first view the sequential, then the categorical), the colours will change unless we always display sequential data on a grey scale.

When two categorical variables are to be displayed on the same map:

What about ranking the frequency of the codes across the two variables, and then following steps 1 and 2 above to assign colours across the variables. This would ensure the most contrasting colours were assigned to the most frequent codes across the variables? But, whatever we do, it would be great if there were a way to avoid same/very similar-colours being displayed side by side.

Another option would be to split the pyramid and assign purples – greens (say colours <17) to one variable, and colours > =18 to the other. Again, the most contrasting colours should be selected by adding colours in the order they are added to the pyramid...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clld/dplace2/issues/12#issuecomment-612204719, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKCNIHWR46P3ZHSR7A3RL567VANCNFSM4MDJVPIA .

kirbykat commented 4 years ago

ps. A cool 'colour blind simulator' someone sent me: https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40. I haven't yet plugged in the top of the pyramid...

kirbykat commented 4 years ago

For interests' sake. Here are the first 9 levels of the pyramid from Paul Tol Fig. 20, viewed through the simulator...not too bad... image