briandk / granovaGG

Bob Pruzek and Jim Helmreich's implementation of Elemental Graphics for Analysis of Variance
Other
15 stars 4 forks source link

plot is missing the green CI line #74

Closed WilDoane closed 13 years ago

WilDoane commented 13 years ago

Reference: http://DrDoane.com/2010/07/133/

install.packages(pkgs="devtools", dependencies=TRUE)
library("devtools")

install_github(repo="granovaGG", username="briandk", branch="dev")
library(granovaGG) 

x <- cbind(post=c(45, 30, 53, 55, 40), pre=c(22, 33, 35, 32, 27)) 
x <- as.data.frame(x)
x
granovagg.ds(x)

The plot doesn't contain a green CI line, on my display.

Granted the data here are artificial, and it's probably potting the lower coordinate of the CI line outside the ggplot display zone.

WilDoane commented 13 years ago

Note: I'm trying to rewrite my DSAP article from last year to reflect granovaGG.

briandk commented 13 years ago

@WilDoane - I think you're right. I manually adjusted the display padding factors and it looks like the CI band actually dips below the difference score band. I had not anticipated such situations.

granovagg.ds CI Band

WilDoane commented 13 years ago

Would it be worth emailing Hadley to see whether there's a way to directly check a ggplot object to see whether it falls within the plot bounds?

My guess is there isn't, because the actual rendering happens too late in the process (well after something has been "p + geom"ed).

-Wil

William Doane http://DrDoane.com

On 2011 Aug 22, at 17:14, briandk wrote:

@Wil - I think you're right. I manually adjusted the display padding factors and it looks like the CI band actually dips below the difference score band. I had not anticipated such situations.

http://BrianDK.com/screenshots/granovaGGdsCIBand-20110822-171313.png

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/issues/74#issuecomment-1874569

briandk commented 13 years ago

@WilDoane - I could check with Hadley, but:

  1. There might not be a way (as you note above)
  2. Even if there is a way, it might take him awhile to get back to me

I'm thinking I might have a suitable solution (see below).

Theoretical background

What's happening here is that the CI Band's coordinates lie outside the boundaries of the visible plot. Right now, the boundaries of the visible plot get set in GetGraphicsParams()

    .aggregate.data.range  <- c(range(GetXs(dsp$data)), range(GetYs(dsp$data)))
    .extrema               <- c(max(.aggregate.data.range), min(.aggregate.data.range))    
    .square.data.range     <- max(.extrema) - min(.extrema)
    .southwest.padding     <- (65/100) * .square.data.range
    .north.east.padding    <- (15/100) * .square.data.range
    .lower.graphical.bound <- min(.extrema) - .southwest.padding
    .upper.graphical.bound <- max(.extrema) + .north.east.padding
    .bounds                <- c(.lower.graphical.bound, .upper.graphical.bound)

The last line (bounds) is what gets passed directly as limits to ggplot2's continuous axis scale functions.

To help you understand what the above code does, consider this visual approach:

Visual sketch of the viewable plot window

Essentially what's happening is

  1. we're determining the smallest square that can circumscribe the entire dataset (by looking at the range of x and y values,
  2. deciding which range of values is bigger,
  3. using that bigger value as the side length of a square to define the square.data.range.window.
  4. Padding the viewing window out to the southwest by 65%
  5. Padding the viewing window out to the northeast by 65%
  6. Returning the limits of the visible plot as a 2-tuple of the expanded southwest and northeast corners
  7. Passing that tuple as the limits of both x and y scales

The things to notice

The viewable plot window is defined by computations on the locations of raw datapoints. It doesn't depend on subsequent statistics computed on those datapoints.

So, the reason granovagg.ds fails on @WilDoane's example dataset above is that the code that computes the visible plot boundaries isn't sensitive to

Instead, the plot bounds are computed with sensible default paddings, but fail on weird datasets like this. I think we can do better.

The last thing to notice is that the computations guarantee the raw data will always be visible. So, the question then is, "how can we guarantee that the CI Band and Difference Crossbow will also always be visible?"

My proposed solution

First, a restatement of the problem:

For certain datasets, either the CI Band or the Crossbow may extend beyond the visible plot window

The thing to remember is that if either of them "breaks" the walls of the plot window, math guarantees it's only going to happen in two possible locations:

With that in mind, I propose writing some code that would essentially do the following:

  1. Take all the information about the coordinates for the Crossbow and CI Band
  2. Put that information in some dataframe (say, CrossElements)
  3. Determine whether the minimum y-coordinate in CrossElements is lower than the minimum coordinate in dsp$params$bounds
    1. If NO, do nothing
    2. If YES, replace the minimum coordinate in bounds with the minimum coordinate in CrossElements (plus some padding, if need be)
  4. Repeat step 3 for the x-coordinate

I've already rewritten the code to make sure both cross elements plot left-to-right. So, I'm pretty sure we can cbind() their outputs to a dataframe and just examine the minimum x and minimum y.end values, respectively.

WilDoane commented 13 years ago

Perhaps just getting the min(...) of

and setting the graphic bound to that value?

-Wil

William Doane http://DrDoane.com

On 2011 Aug 22, at 19:32, briandk wrote:

@WilDoane - I could check with Hadley, but:

  1. There might not be a way (as you note above)
  2. Even if there is a way, it might take him awhile to get back to me

I'm thinking I might have a suitable solution (see below).

Theoretical background

What's happening here is that the CI Band's coordinates lie outside the boundaries of the visible plot. Right now, the boundaries of the visible plot get set in GetGraphicsParams()

   .aggregate.data.range  <- c(range(GetXs(dsp$data)), range(GetYs(dsp$data)))
   .extrema               <- c(max(.aggregate.data.range), min(.aggregate.data.range))    
   .square.data.range     <- max(.extrema) - min(.extrema)
   .southwest.padding     <- (65/100) * .square.data.range
   .north.east.padding    <- (15/100) * .square.data.range
   .lower.graphical.bound <- min(.extrema) - .southwest.padding
   .upper.graphical.bound <- max(.extrema) + .north.east.padding
   .bounds                <- c(.lower.graphical.bound, .upper.graphical.bound)

The last line (bounds) is what gets passed directly as limits to ggplot2's continuous axis scale functions.

To help you understand what the above code does, consider this visual approach:

Visual sketch of the viewable plot window

Essentially what's happening is

  1. we're determining the smallest square that can circumscribe the entire dataset (by looking at the range of x and y values,
  2. deciding which range of values is bigger,
  3. using that bigger value as the side length of a square to define the square.data.range.window.
  4. Padding the viewing window out to the southwest by 65%
  5. Padding the viewing window out to the northeast by 65%
  6. Returning the limits of the visible plot as a 2-tuple of the expanded southwest and northeast corners
  7. Passing that tuple as the limits of both x and y scales

The things to notice

The viewable plot window is defined by computations on the locations of raw datapoints. It doesn't depend on subsequent statistics computed on those datapoints.

So, the reason granovagg.ds fails on @WilDoane's example dataset above is that the code that computes the visible plot boundaries isn't sensitive to

  • information computed for the pairwise score differences (GetCrossbow)
  • information computed for the confidence interval of the estimated treatment effect (GetCIBand)

Instead, the plot bounds are computed with sensible default paddings, but fail on weird datasets like this. I think we can do better.

The last thing to notice is that the computations guarantee the raw data will always be visible. So, the question then is, "how can we guarantee that the CI Band and Difference Crossbow will also always be visible?"

My proposed solution

First, a restatement of the problem:

For certain datasets, either the CI Band or the Crossbow may extend beyond the visible plot window

The thing to remember is that if either of them "breaks" the walls of the plot window, math guarantees it's only going to happen in two possible locations:

  • The south side of the graph
  • The west side of the graph

With that in mind, I propose writing some code that would essentially do the following:

  1. Take all the information about the coordinates for the Crossbow and CI Band
  2. Put that information in some dataframe (say, CrossElements)
  3. Determine whether the minimum y-coordinate in CrossElements is lower than the minimum coordinate in dsp$params$bounds
    1. If NO, do nothing
    2. If YES, replace the minimum coordinate in bounds with the minimum coordinate in CrossElements (plus some padding, if need be)
  4. Repeat step 3 for the x-coordinate

I've already rewritten the code to make sure both cross elements plot left-to-right. So, I'm pretty sure we can cbind() their outputs to a dataframe and just examine the minimum x and minimum y.end values, respectively.

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/issues/74#issuecomment-1875673