IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Small puzzle with boxplot dialog? #8995

Closed rdstern closed 4 weeks ago

rdstern commented 1 month ago

@jkmusyoka and @MeSophie and @Vitalis95 as I write this issue, I have a suggestion for solving the puzzle. It is a seasonal plot for Dodoma monthly rainfall totals. But then I uncovered a reasl problem I hope for Vitalis?

So most of this is for reading - good statistics here I think!

The data are here

dodoma_Ghana_walter (2).zip

I realised that boxplots on the monthly data show the pattern of some percentage points, so here was my first set of boxplots

image

Almost fine. But this isn't a graph to show outliers, so let's change the coef from 1.5 upwards, so the lines for to the ends of the data.

Here is the graph:

image

That's fine - except for the really dry months of Juneto September. And that is with a coefficient of 1000!

That's because more than 3/4 of the years are dry, so the interquartile range is zero.

a) When I started this issue I just had one solution. I found that if you make coef=NULL then all is ok, and that is really easy to do as a small tweak of the script.

image

Problem solved. But

b) Let's think a bit more. Bit silly to use boxplots on data with so many zeros, without taking account of them! So filter the data for zero and then make the width proportional to sample size left. So this would be better:

image

That's a better graph! Interesting example for teaching too. And the fact the last (variable width) graph is much more sensible the solution obvious. We do0n't try and be clever and permit NULL. It is so rarely needed. It can be done, via the script, but shouldn't be needed, because there is a better way?

Now Vitalis there is lurking a real problem in the Boxplot dialog. Here is is:

image

Notice the Variable Width checkbox is moved to the right. So what is the Width checkbox on the left doing now???

The answer is partly that we should change the label to Cut Width. The main part of the answer though is that it should only be visible when the x variable is numeric. This is a new feature in the amazing ggplot system, see here at the bottom. (Note it does not apply at all when - as usual 0 the x is a factor,