Closed briandk closed 3 years ago
Overplotting is clearly something we want to avoid, but there are several ways to deal with this, and after a good deal of thought, I want to recommend the following (and this does remove the overplotting logic as per Brian's narrative above): Suppose a subset of k means, and hence the effects for the corresponding groups [m < j > - grandmean] are 'sufficiently close to one another that the case-data points run into one another. (Okay to be 'liberal' about choosing k; better to overestimate than underestimate.) Now proceed to alter the .1w graphic (so it will lack complete fidelity wrt the initial means), as follows. Compute the median of these k means. Now compute 'pseudo effects' by subtracting the median* (md) from each of the k means; these will be of the form m < j > - md. Multiply all pseudo effects by a constant (W, say, where W exceeds unity (or one) by a positive constant w, where w is a function of the range R of all means. (e.g., w* = R/25 seems reasonable. This leads to W = 1 + R/25. Finally, add the midrange mr to each of these revised pseudo effects; these will be of the form W*(m < j > - md) + md. That's it. These k values will serve as replacements for the original group means. They will necessarily be separated from one another in relation to the original k means simply because W exceeds unity. The average of all 'means' will not be changed (more than trivially) from the original grand mean, and the printed table should probably just ignore these adjusted means (except in trials?) so it will only be the graphic that has been altered. Again, perfect fidelity w/ the original data will of course have been lost in the graphic, but the gain will more than compensate for the rather minor changes in the data-to-be-plotted.
NB: I had written m sub j using < and > to index j, but these have been lost here. I shall put the original in a Word document, which I'll email or post (? where on github), so that the details are not lost. b NB2: My edits, now w/ spacing, seem to have fixed the problems. Let me know what is unclear.
@rmpruzek - Based on my understanding of your post, I'm not convinced your method is general enough to be safe.
Consider an example where we plot some group of means, but we're interested in means 1 - 4. Your method would identify a subset of k means (viz. 2 - 4), then apply pseudo-effects to them and visually alter their position. The potential problem I see is that in introducing pseudo-effects, you might also produce a situation where a new overplotting results from adjusting the old data. In the image below, adding pseudo-effects to means 2 - 4 actually results in means 1 and 2 now being overplotted:
So, in sum, I'm not convinced that your proposal "solves" the overplotting problem unless you can convince me that it will never introduce new overplotting.
This merely says that there may be situations where the method might have to be applied iteratively. In this case, k = 2 for pair(1,2). Apply the method again, and as long as the W is reasonably chosen, all should be well after the second cycle has been completed. As to a general 'proof' that the method will never, can never, fail, let's remember the old dictum: the best (or perfect) can be the enemy of the good. I do not seek universal perfection, and recommend we get on w/ our lives after making a reasonable try for a good fix. (And I do not seek anyone's approval, nor should you mine.) bob
Correcting an error in my post of 4 days ago: My sentence (near the middle) "Finally, add the midrange mr to each of these revised pseudo effects; these will be of the form W(m < j > - md) + md." should say, "Finally, add the median md to each of these revised pseudo effects; these will be of the form W(m < j > - md) + md." There is another issue here too, tho' this one really should be discussed synchronously (ichat?): When the no. of cases (N) becomes 'quite large' we might transition to something along the line of boxplots (w/ jittered interior points), or violin plots for the respective groups. That the issue involves various questions of judgment, programming complexity, etc., plus the fact that it is so basic is why it deserves discussion among several of us. We might also look at some real-data trials w/ the k (subset of) means idea implemented if you, Brian, would be willing to write the code for this to facilitate trials. b
Refs #134
The default value for jitter (
jj
) ingranova.1w
was1
:For
granovagg.1w
, we introduced overplotting logic that would recalibrate jitter if groups were too close together. The idea was to protect the user: if groups were too close together, the jittering amount would be very conservative so adjacently plotted groups could still be individually resolved. But, introducing that logic involved a key departure from the classicgranova.1w
API. With the new overplotting logic:jj
was set toNULL
jj == NULL
, the jitter amount defaulted to one percent of the horizontal resolution of the data, UNLESSjj == NULL
and at least one pair of means was in danger of overplotting. If there was overplotting danger whenjj
was set toNULL
, the jitter amount would default to the smallest distance between adjacent contrasts.If we want change the default value of jj to 1, I fear we'll have to go one of two routes:
jj
values if the group means are in danger of overplotting.granovagg.1w
called something likesafe.jittering
that would turn the overplotting logic on or off, thus allowing the user to bypass overplotting protection.Option 1 is simple and possible. Combined with our current logic for marking likely overplotted groups in red, it lets the user simply tweak and shrink jittering until they can safely resolve groups. But it's risky if users don't recognize when their data is overplotted.
Option 2 strikes me as undesirable: it gives users the illusion of having control over jittering when really we can be like overprotective parents and override their supplied value if we think it's too dangerous.
Option 3 is kludgy, inelegant, and adds an additional burden to the user for remembering parameters.
I need both @rmpruzek and @wildoane to weigh in on this issue before I can go forward with any changes.