JetBrains / lets-plot

Multiplatform plotting library based on the Grammar of Graphics
https://lets-plot.org
MIT License
1.57k stars 51 forks source link

Is there an out-of-the-box way to do a faceted histogram with percentages instead of counts? #1155

Closed araichev closed 2 months ago

araichev commented 3 months ago

Without having to create the percentages in your dataframe ahead of time? Seems possible in ggplot: https://forum.posit.co/t/trouble-scaling-y-axis-to-percentages-from-counts/42999/3 .

ASmirnov-HORIS commented 3 months ago

Hello!

To make a histogram display density instead of counts:

+ geom_histogram(aes(y='..density..'))

To format it as a percentage:

+ scale_y_continuous(format=".0%")

Could you please clarify what do you mean by "faceted histogram"?

araichev commented 3 months ago

Thanks for your response, @ASmirnov-HORIS . To be clear, i want to keep the histogram as a bar chart and not do kernel density estimation. Re faceting, here's an example of what i mean: https://seaborn.pydata.org/examples/faceted_histogram.html , but instead of counts i wan percentages relative to the individual group counts. Does that make sense?

ASmirnov-HORIS commented 3 months ago

Sorry if I confused you, but aes(y='..density..') does not apply the density statistic to the histogram, it’s just a way of normalising the y-values. The normalisation should be such that the area of the plot is 1, but we seem to have found a bug in our formulas, and so far this is not the case.

Nevertheless, here is a code on Lets-Plot, based on your demo:

import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/penguins.csv")
ggplot(df, aes(x="flipper_length_mm")) + \
    geom_histogram(aes(y='..density..'), binwidth=3, center=1) + \
    scale_y_continuous(format=".0%") + \
    facet_grid(x="species", y="sex", y_order=-1)

1155_plot1

araichev commented 3 months ago

Thanks for the clarification and example @ASmirnov-HORIS . Yes, with that code my plots have bars exceeding 100%, so something is wrong with the Lets-Plot formulas. I'll keep an eye on Issue 1157.

ASmirnov-HORIS commented 2 months ago

I see. Bar heights may exceed 1 (=100%) if binwidth is less than 1 because ..density.. is normalized by plot area. Are you looking for a different normalization (so that the sum of the values equals 1)? You could also check geom_bar() API or demo notebook. Let us know if there are any variables you would like to see in geom_histogram().

araichev commented 2 months ago

For each facet group, i'm looking for a histogram of the counts within the group divided by the total count within the group, expressed as a percentage. Thus the sum of percentage bars within each group will equal 100%.

alshan commented 2 months ago

Hi @araichev , we've just added ..sumprop.. and ..sumpct.. computed variables to the "bin" statistic. This should cover your use case, see https://nbviewer.org/github/JetBrains/lets-plot/blob/master/docs/f-24f/new_stat_bin_vars.ipynb

UPD: v4.4.1