daattali / ggExtra

πŸ“Š Add marginal histograms to ggplot2, and more ggplot2 enhancements
http://daattali.com/shiny/ggExtra-ggMarginal-demo/
Other
383 stars 45 forks source link

Make first attempt at refactoring ggMarginal (should close #38 and #37) #40

Closed crew102 closed 7 years ago

crew102 commented 7 years ago

This PR makes substantial changes to the internal design of ggMarginal (mostly towards the beginning of the function). Hopefully it doesn't break anything...

I think there is still more work to be done - there are still several levels of abstraction all mixed about in the main body of ggMarginal. Let's start with this and take it from there.

crew102 commented 7 years ago

I think we should add support for testing ggMarginal for each version of ggplot2 that exists, so we don't have to check manually. We could probably just customize the build on travis to do this. Another option would be circleci. Do you have a preference? I don't mind writing the build file, but it should be linked to your account (circleci or travis).

daattali commented 7 years ago

Quite the PR! I'll try to take a look on the weekend. Have you tried testing on different ggplot2 versions?

You're very right that there needs to be some testing system in place. At the time I created the package, I didn't want to add tests because the only possible test I could think of was saving images of different calls and comparing images. It seemed terrible to me. But now after seeing how much time I spend manually testing each version, it was definitely a bad decision to not use tests. So yes, please do go ahead and add tests if you don't mind!

If you do decide to tackle adding tests, maybe it'd be useful to do that first, leave this PR open, generate the tests, and only afterwards come back to this PR to make sure that nothing broke?

Thank you for all your dedication!

daattali commented 7 years ago

Also, could you please email me (I wasn't able to find your email or any contact info)

daattali commented 7 years ago

Just to make sure we're on the same page: the plan is to add unit testing first, and only once that is working we will come back to tackle this, right? That way we ensure this PR doesn't introduce any regression bugs

crew102 commented 7 years ago

Yep, that was the plan

crew102 commented 7 years ago

Hey Dean -

I wanted to see what you thought about me adding myself as a contributor to ggMarginal in the DESCRIPTION file. I figured it's better to ask in private than on github.

Chris

On Sat, Apr 15, 2017 at 12:57 AM, Dean Attali notifications@github.com wrote:

Just to make sure we're on the same page: the plan is to add unit testing first, and only once that is working we will come back to tackle this, right? That way we ensure this PR doesn't introduce any regression bugs

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/pull/40#issuecomment-294272436, or mute the thread https://github.com/notifications/unsubscribe-auth/AKs3OYHn4PWTG7QxUxczTYSVqyxfLPwYks5rwE4-gaJpZM4MVpt3 .

crew102 commented 7 years ago

Sorry, meant to say ggExtra, not marginal

On Thu, Apr 27, 2017 at 4:57 PM, Christopher Baker <chriscrewbaker@gmail.com

wrote:

Hey Dean -

I wanted to see what you thought about me adding myself as a contributor to ggMarginal in the DESCRIPTION file. I figured it's better to ask in private than on github.

Chris

On Sat, Apr 15, 2017 at 12:57 AM, Dean Attali notifications@github.com wrote:

Just to make sure we're on the same page: the plan is to add unit testing first, and only once that is working we will come back to tackle this, right? That way we ensure this PR doesn't introduce any regression bugs

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/pull/40#issuecomment-294272436, or mute the thread https://github.com/notifications/unsubscribe-auth/AKs3OYHn4PWTG7QxUxczTYSVqyxfLPwYks5rwE4-gaJpZM4MVpt3 .

daattali commented 7 years ago

Of course Chris! I was meaning on doing that after your previous PR was merged but forgot to. Submit a PR for that :)


Dean Attali President & CEO AttaliTech Ltd http://AttaliTech.com http://attalitech.com

On 27 April 2017 at 16:57, Chris Baker notifications@github.com wrote:

Sorry, meant to say ggExtra, not marginal

On Thu, Apr 27, 2017 at 4:57 PM, Christopher Baker < chriscrewbaker@gmail.com

wrote:

Hey Dean -

I wanted to see what you thought about me adding myself as a contributor to ggMarginal in the DESCRIPTION file. I figured it's better to ask in private than on github.

Chris

On Sat, Apr 15, 2017 at 12:57 AM, Dean Attali notifications@github.com wrote:

Just to make sure we're on the same page: the plan is to add unit testing first, and only once that is working we will come back to tackle this, right? That way we ensure this PR doesn't introduce any regression bugs

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/pull/40#issuecomment-294272436, or mute the thread https://github.com/notifications/unsubscribe-auth/ AKs3OYHn4PWTG7QxUxczTYSVqyxfLPwYks5rwE4-gaJpZM4MVpt3 .

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/pull/40#issuecomment-297837312, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6IFKfQUwd648GzUi1316jq33LHuky3ks5r0QFEgaJpZM4MVpt3 .

daattali commented 7 years ago

@crew102 looks like "in private" didn't really work :p But that's ok, your acceptance can be visible to all :)

crew102 commented 7 years ago

oh whops - didn't realize it was getting published to github...i guess i'll have to assume it was my code and not any public shamming that got me the contrib line :)

daattali commented 7 years ago

We'll never know.......


Dean Attali President & CEO AttaliTech Ltd http://AttaliTech.com http://attalitech.com

On 27 April 2017 at 20:33, Chris Baker notifications@github.com wrote:

oh whops - didn't realize it was getting published to github...i guess i'll have to assume it was my code and not any public shamming that got me the contrib line :)

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/pull/40#issuecomment-297875819, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6IFDbhzVA7lmxCIm3M21MEpbc_ToySks5r0TPwgaJpZM4MVpt3 .

daattali commented 7 years ago

Is this PR ready? Is there a unit test for coord_flip()? (to make sure bug #37 doesn't happen in the future)

crew102 commented 7 years ago

Sorry, was going to add some comments when I got back from home. But yes, a test for coord_flip working should be added - didn't think of that. I'll add that unit test along with several ones that test that our helper funs work, repush, then add some comments for the final pr

crew102 commented 7 years ago

This PR:

  1. Refactors ggMarginal

    • Several of the stages/tasks that ggMaringal completes to create the final plot are now broken into (more) helper functions
    • The accession of the panel_scales object is now dependent on the ggplot2 version used (see getPanelScale function). This should close #38.
    • We now check if the coordinate system was flipped by the user, and if it was, ggMarginal respects that transformation in the final plot (see needsFlip). A visual test was added to test this behavior... This should close #37.
  2. Adds unit tests to verify our methods/assumptions surrounding ggplot2 internals (together with 1 above this should close #43).

daattali commented 7 years ago

This is fantastic! I'll take a look hopefully over the weekend, thank you

crew102 commented 7 years ago

Hey Dean -

I will make the changes you requested. I have two questions about two chunks of your original code (these chunks are also in the refactor, and I'm wondering if we can get rid of them):

  1. What is the purpose of this chunk:

    # Add the longest y axis label to the top plot and ensure it's at a y value
    # that is on the plot (this is why I build the top plot, to know the y values)
    pbTop <- ggplot2::ggplot_build(top)
    ylabels <- scatPbuilt$layout$panel_ranges[[1]]$y.labels
    ylabel <- ylabels[which.max(nchar(ylabels))]
    
    if (type == "boxplot") {
      breaks <- mean(getLimits(marg = "x", builtP = pbTop))
      top <- top + ggplot2::scale_x_continuous(breaks = breaks, labels = ylabel)
    } else {
      breaks <- mean(getLimits(marg = "y", builtP = pbTop))
      top <- top + ggplot2::scale_y_continuous(breaks = breaks, labels = ylabel)
    }

    and why is it run only when creating the top marginal plot, not the right marginal plot? When I remove this chunk all tests still pass.

  2. What is the purpose of the third line of this chunk in getScale (non-refactor version):

    if (type == "boxplot") {
      scale <- pb$layout$panel_scales$x[[1]]
      scale$aesthetics <- gsub("^x", "y", scale$aesthetics)
      scale$limits <- pb$layout$panel_scales$x[[1]]$get_limits() # this line
    }

and why do we change the limits only for the boxplot marginal plot? When I remove it all tests still pass (and the issue you raised re: coord_flip not working with type = "boxplot" seems to be resolved). Chris

daattali commented 7 years ago
  1. I don't remember the exact plot code that required this logic, this is why I wish I knew about testing in R when I made this! Essentially what was happening was that sometimes (I don't remember when) there would be a long y label - longer than all the other labels - and because of it the original plot would be pushed to the right, and somehow that affected the alignment of the main plot with the top plot. But I really cannot remember what plots cause this. I'm ok with removing this for now so that there is no unknown mystery code, and if this issue ever comes up we can revisit. It's a good evidence of why tests really need to cover all bases :)

  2. Again, I have no good answer for you, I'm certain that the limits are added for a good reason, but I cannot remember why. Tracking down when that code was added: https://github.com/daattali/ggExtra/commit/0743b214611896c1a943f461a368b3f113b37961 which references this issue: https://github.com/daattali/ggExtra/issues/18 but I don't think these lines were added for that issue. We can remove this as well.

Anything that was in the original code and was removed (because I didn't make it clear why it was needed!) please document in the NEWS so that we keep track of what was removed.

Thanks for being so thorough - I know it's never fun to have to navigate someone else's code and try to understand what went through his head!

crew102 commented 7 years ago

OK, I'll remove those and document in NEWS. If it breaks someone's code then we can add back what is needed and add new test cases as well.

crew102 commented 7 years ago

Last question - Why add the axis labels to the marginal plots when we don't want them there (and they won't show up anyway. due to theme object saying so). For example, on line 183 in ggMarginal.R:

top <- top +
      ggplot2::ylab(p$labels$y)
crew102 commented 7 years ago

Nevermind, I think this comment says why:

  # - Use the same axis titles as the main plot, to ensure the same space is taken

I wonder if we can the addition of the labs, though. What do you think?

daattali commented 7 years ago

I think that's along the reasoning as why I have that code in question 1 above. But I cannot reproduce :)

crew102 commented 7 years ago

OK, that's what I figured. I'll push changes sometime later today.

crew102 commented 7 years ago

One thing to note here, running devtools::check will throw the following NOTE:

margPlotNoGeom: no visible binding for global variable β€˜var’
Undefined global functions or variables:
  var

I assume we will have to put a comment about this in cran-comments.md when it comes time to resubmit.

crew102 commented 7 years ago

Hey Dean, don't merge this yet - I think I lost some of my commits along the way.

crew102 commented 7 years ago

Sorry about that - I was working on two different machines with the same email address and somehow lost some commits along the way. The tip of this branch should contain them all now, though.

daattali commented 7 years ago

I do see there are some merge commits today - those really are dangerous and annoying. I always freak out when I accidentally do a merge and then can't properly see what changes were done, it's so unintuitive.... I hope you brought back all the code!

I'll merge tomorrow or the next day

Thanks for also breaking up the work into separate commits, it helps when reviewing

daattali commented 7 years ago

I just noticed you mentioned the R CMD NOTE about the binding of var, and I see where it is in the code. I think using ggplot2::aes_string() could be used to avoid that, no? CRAN submission is going to be difficult with this NOTE if it seems like there is a way around it!

crew102 commented 7 years ago

I think this should do it!

daattali commented 7 years ago

merging into non-master branch