Closed WilDoane closed 13 years ago
Note: I'm trying to rewrite my DSAP article from last year to reflect granovaGG.
@WilDoane - I think you're right. I manually adjusted the display padding factors and it looks like the CI band actually dips below the difference score band. I had not anticipated such situations.
Would it be worth emailing Hadley to see whether there's a way to directly check a ggplot object to see whether it falls within the plot bounds?
My guess is there isn't, because the actual rendering happens too late in the process (well after something has been "p + geom"ed).
-Wil
William Doane http://DrDoane.com
On 2011 Aug 22, at 17:14, briandk wrote:
@Wil - I think you're right. I manually adjusted the display padding factors and it looks like the CI band actually dips below the difference score band. I had not anticipated such situations.
http://BrianDK.com/screenshots/granovaGGdsCIBand-20110822-171313.png
Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/issues/74#issuecomment-1874569
@WilDoane - I could check with Hadley, but:
I'm thinking I might have a suitable solution (see below).
What's happening here is that the CI Band's coordinates lie outside the boundaries of the visible plot. Right now, the boundaries of the visible plot get set in GetGraphicsParams()
.aggregate.data.range <- c(range(GetXs(dsp$data)), range(GetYs(dsp$data)))
.extrema <- c(max(.aggregate.data.range), min(.aggregate.data.range))
.square.data.range <- max(.extrema) - min(.extrema)
.southwest.padding <- (65/100) * .square.data.range
.north.east.padding <- (15/100) * .square.data.range
.lower.graphical.bound <- min(.extrema) - .southwest.padding
.upper.graphical.bound <- max(.extrema) + .north.east.padding
.bounds <- c(.lower.graphical.bound, .upper.graphical.bound)
The last line (bounds
) is what gets passed directly as limits
to ggplot2's continuous axis scale functions.
To help you understand what the above code does, consider this visual approach:
Essentially what's happening is
square.data.range.window
.The viewable plot window is defined by computations on the locations of raw datapoints. It doesn't depend on subsequent statistics computed on those datapoints.
So, the reason granovagg.ds
fails on @WilDoane's example dataset above is that the code that computes the visible plot boundaries isn't sensitive to
GetCrossbow
)GetCIBand
)Instead, the plot bounds are computed with sensible default paddings, but fail on weird datasets like this. I think we can do better.
The last thing to notice is that the computations guarantee the raw data will always be visible. So, the question then is, "how can we guarantee that the CI Band and Difference Crossbow will also always be visible?"
First, a restatement of the problem:
For certain datasets, either the CI Band or the Crossbow may extend beyond the visible plot window
The thing to remember is that if either of them "breaks" the walls of the plot window, math guarantees it's only going to happen in two possible locations:
With that in mind, I propose writing some code that would essentially do the following:
CrossElements
)CrossElements
is lower than the minimum coordinate in dsp$params$bounds
bounds
with the minimum coordinate in CrossElements
(plus some padding, if need be)I've already rewritten the code to make sure both cross elements plot left-to-right. So, I'm pretty sure we can cbind()
their outputs to a dataframe and just examine the minimum x
and minimum y.end
values, respectively.
Perhaps just getting the min(...) of
and setting the graphic bound to that value?
-Wil
William Doane http://DrDoane.com
On 2011 Aug 22, at 19:32, briandk wrote:
@WilDoane - I could check with Hadley, but:
- There might not be a way (as you note above)
- Even if there is a way, it might take him awhile to get back to me
I'm thinking I might have a suitable solution (see below).
Theoretical background
What's happening here is that the CI Band's coordinates lie outside the boundaries of the visible plot. Right now, the boundaries of the visible plot get set in
GetGraphicsParams()
.aggregate.data.range <- c(range(GetXs(dsp$data)), range(GetYs(dsp$data))) .extrema <- c(max(.aggregate.data.range), min(.aggregate.data.range)) .square.data.range <- max(.extrema) - min(.extrema) .southwest.padding <- (65/100) * .square.data.range .north.east.padding <- (15/100) * .square.data.range .lower.graphical.bound <- min(.extrema) - .southwest.padding .upper.graphical.bound <- max(.extrema) + .north.east.padding .bounds <- c(.lower.graphical.bound, .upper.graphical.bound)
The last line (
bounds
) is what gets passed directly aslimits
to ggplot2's continuous axis scale functions.To help you understand what the above code does, consider this visual approach:
Essentially what's happening is
- we're determining the smallest square that can circumscribe the entire dataset (by looking at the range of x and y values,
- deciding which range of values is bigger,
- using that bigger value as the side length of a square to define the
square.data.range.window
.- Padding the viewing window out to the southwest by 65%
- Padding the viewing window out to the northeast by 65%
- Returning the limits of the visible plot as a 2-tuple of the expanded southwest and northeast corners
- Passing that tuple as the limits of both x and y scales
The things to notice
The viewable plot window is defined by computations on the locations of raw datapoints. It doesn't depend on subsequent statistics computed on those datapoints.
So, the reason
granovagg.ds
fails on @WilDoane's example dataset above is that the code that computes the visible plot boundaries isn't sensitive to
- information computed for the pairwise score differences (
GetCrossbow
)- information computed for the confidence interval of the estimated treatment effect (
GetCIBand
)Instead, the plot bounds are computed with sensible default paddings, but fail on weird datasets like this. I think we can do better.
The last thing to notice is that the computations guarantee the raw data will always be visible. So, the question then is, "how can we guarantee that the CI Band and Difference Crossbow will also always be visible?"
My proposed solution
First, a restatement of the problem:
For certain datasets, either the CI Band or the Crossbow may extend beyond the visible plot window
The thing to remember is that if either of them "breaks" the walls of the plot window, math guarantees it's only going to happen in two possible locations:
- The south side of the graph
- The west side of the graph
With that in mind, I propose writing some code that would essentially do the following:
- Take all the information about the coordinates for the Crossbow and CI Band
- Put that information in some dataframe (say,
CrossElements
)- Determine whether the minimum y-coordinate in
CrossElements
is lower than the minimum coordinate indsp$params$bounds
- If NO, do nothing
- If YES, replace the minimum coordinate in
bounds
with the minimum coordinate inCrossElements
(plus some padding, if need be)- Repeat step 3 for the x-coordinate
I've already rewritten the code to make sure both cross elements plot left-to-right. So, I'm pretty sure we can
cbind()
their outputs to a dataframe and just examine the minimumx
and minimumy.end
values, respectively.Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/issues/74#issuecomment-1875673
Reference: http://DrDoane.com/2010/07/133/
The plot doesn't contain a green CI line, on my display.
Granted the data here are artificial, and it's probably potting the lower coordinate of the CI line outside the ggplot display zone.