Closed araichev closed 2 months ago
Hello!
To make a histogram display density instead of counts:
+ geom_histogram(aes(y='..density..'))
To format it as a percentage:
+ scale_y_continuous(format=".0%")
Could you please clarify what do you mean by "faceted histogram"?
Thanks for your response, @ASmirnov-HORIS . To be clear, i want to keep the histogram as a bar chart and not do kernel density estimation. Re faceting, here's an example of what i mean: https://seaborn.pydata.org/examples/faceted_histogram.html , but instead of counts i wan percentages relative to the individual group counts. Does that make sense?
Sorry if I confused you, but aes(y='..density..')
does not apply the density statistic to the histogram, it’s just a way of normalising the y-values. The normalisation should be such that the area of the plot is 1, but we seem to have found a bug in our formulas, and so far this is not the case.
Nevertheless, here is a code on Lets-Plot, based on your demo:
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/penguins.csv")
ggplot(df, aes(x="flipper_length_mm")) + \
geom_histogram(aes(y='..density..'), binwidth=3, center=1) + \
scale_y_continuous(format=".0%") + \
facet_grid(x="species", y="sex", y_order=-1)
Thanks for the clarification and example @ASmirnov-HORIS . Yes, with that code my plots have bars exceeding 100%, so something is wrong with the Lets-Plot formulas. I'll keep an eye on Issue 1157.
I see.
Bar heights may exceed 1 (=100%) if binwidth
is less than 1 because ..density..
is normalized by plot area. Are you looking for a different normalization (so that the sum of the values equals 1)?
You could also check geom_bar() API or demo notebook. Let us know if there are any variables you would like to see in geom_histogram()
.
For each facet group, i'm looking for a histogram of the counts within the group divided by the total count within the group, expressed as a percentage. Thus the sum of percentage bars within each group will equal 100%.
Hi @araichev , we've just added ..sumprop..
and ..sumpct..
computed variables to the "bin" statistic. This should cover your use case, see https://nbviewer.org/github/JetBrains/lets-plot/blob/master/docs/f-24f/new_stat_bin_vars.ipynb
UPD: v4.4.1
Without having to create the percentages in your dataframe ahead of time? Seems possible in ggplot: https://forum.posit.co/t/trouble-scaling-y-axis-to-percentages-from-counts/42999/3 .