daattali / ggExtra

📊 Add marginal histograms to ggplot2, and more ggplot2 enhancements
http://daattali.com/shiny/ggExtra-ggMarginal-demo/
Other
380 stars 48 forks source link

Showing both density and histogram marginals in the same plot #118

Closed IndrajeetPatil closed 6 years ago

IndrajeetPatil commented 6 years ago

In python, there is an easy way to plot both the density and marginal distributions on the axes simultaneously (https://seaborn.pydata.org/generated/seaborn.jointplot.html).

Will this be possible to implement in ggExtra as well?

image

crew102 commented 6 years ago

The feature you're describing should be easy enough to add. Question is, what should be the syntax for making such a call to ggExta? Thoughts @daattali?

On Thu, May 24, 2018, 8:34 PM Indrajeet Patil notifications@github.com wrote:

In python, there is an easy way to plot both the density and marginal distributions on the axes simultaneously ( https://seaborn.pydata.org/generated/seaborn.jointplot.html).

Will this be possible to implement in ggExtra as well?

[image: image] https://user-images.githubusercontent.com/11330453/40520097-a2df5de0-5f91-11e8-9b61-6e8918b4d0cb.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/daattali/ggExtra/issues/118, or mute the thread https://github.com/notifications/unsubscribe-auth/AKs3OYad75fxzxhOEHdLK5yaEEhaZEHJks5t11GRgaJpZM4UNMpa .

IndrajeetPatil commented 6 years ago

Ha, I was thinking of the same. In my package, for one function, you can create box and/or violin plots and so the possible options are "box", "violin", and "boxviolin", so I was thinking something similar might work. I know this is silly, but maybe a portmanteau like "densigram"?

People are coming with new names to unusual plots all the time (https://orchid00.github.io/tidy_raincloudplot).

crew102 commented 6 years ago

Hmm, yeah, I think I'd have to agree with you re: coming up with a name for the histogram/density combined plot. I was initially thinking that it would be more natural to instead specify both plot types to type (e.g., type = c("histogram", "density")), so that the syntax would be extensible to other pairings of plot types (e.g., type = c("histogram", "violin")). However, I think that the histogram/density pairing is the only one that will be used, so it makes sense to just come up with a name for this plot type.

daattali commented 6 years ago

@IndrajeetPatil thanks for the suggestion, that's a great idea! Regarding the name, I favour having a single name rather than a vector, and densigram sounds good to me. The only thing that immediately comes to mind when trying to think of potential problems is that it might be awkward to deal with showing both types as different colours. I'm not sure what the parameters for that would look, and I don't want to add more parameters just to support this. I don't think they can both be the same colour because then you would lose a lot of information where they overlap, so there probably should be some mechanism for different colours. This needs to be thought of

IndrajeetPatil commented 6 years ago

Hmm, fair enough. That's a tricky issue indeed.

But just to get this functionality off the ground, one option might be to not give flexibility for the density line being plotted. That is, the xfill, yfill, etc. arguments will always be passed to histogram, while the density plot line (without any fill) will always be solid (linetype) and black (color)?

Better to have something more rigid than not have it at all. (I hate it when python has something and R doesn't and this plot is one of those things 😉 )

crew102 commented 6 years ago

I don't think they can both be the same colour because then you would lose a lot of information where they overlap, so there probably should be some mechanism for different colours.

Technically they can both be the same colour and you'd be fine - it's only when the fill of the histogram is the same as the colour of the density that you wouldn't be able to see the respective plots where they overlap. Also note that the default colour and fill of histograms is grey, and the default colour of density plots is black, so unless the user changes one of these then they should be OK.

library(ggplot2)

ggplot(mtcars, aes(x = wt)) +
    geom_histogram() +
    geom_density()

ggplot(mtcars, aes(x = wt)) +
    geom_histogram(fill = "black") +
    geom_density()

Plus, they could always specify an alpha value if they want the histogram fill to be the same as the density colour, while still being able to see where the two overlap.

crew102 commented 6 years ago

Plus, they could always specify an alpha value if they want the histogram fill to be the same as the density colour, while still being able to see where the two overlap.

Actually, I forgot about this: https://github.com/rstudio/rstudio/issues/2196, so maybe suggesting alpha to users would not be wise here.

daattali commented 6 years ago

Thanks for the image references. I think the top one (where there's grey and black) is fine, but I would argue that the bottom one is not great, you can deduce where the line goes, but it looks awkward, half the line is hidden. And if there was just one observation in each bucket in the histogram, then I think 90% of the line would be invisible. In my opinion the colours would have be different. And yes, alpha is no good :) So I don't think if we can just assume the default colours will be different and leave it up to the user to ensure the colours remain different?

crew102 commented 6 years ago

The second image was my example of the problematic case, where the user wants to have the same histogram fill as their density colour. I think the most common case will the case in the first image, where they aren't specifying any colours or fills.

daattali commented 6 years ago

Ah ok good, I thought you were trying to say that the second image is fine!