eikeluedeling / decisionSupport

6 stars 2 forks source link

plot_distributions( method = "hist_simple_overlay" ) histogramm stacks #39

Open JBSLutum opened 1 year ago

JBSLutum commented 1 year ago

The histogramm created with plot_distribution( method = "hist_simple_overlay" ) stacks values on top of each other. Should be fixed with ( position = "identity" ). For other plots like geom_density this is default, for geom_histogram position = "stack" is deafult.

Example plots from ?plot_distribution : In the one with both plots together the bins stack up to 500, single the maximum of the bins is around 250

Rplot Rplot01 Rplot02

CWWhitney commented 1 year ago

Thanks for catching that - seems to be a quick fix - hopefully we can get to this in the next set of big changes

EduardoFernandezC commented 1 year ago

Nice that you found this bug @JBSLutum! However, I think this may only be a partial bug. As I understand, the problem here is the x-axis and the number of bins. If we plot each individual distribution we will have a different scale (from -50 K to 150 K in the first plot and from -25 K to 75 K in the second one) with the same number of bins (150 by default - I found a small issue there now). Under these specifications, the number of observations per bin will be pretty similar regardless of the x-axis.

If we now plot both distributions in the same plot, the number of bins is set according to the distribution having the largest range. This of course affects the size of the bin, which will now differ between distributions. That explain the increase in the number of observations per bin when you plot both distributions together.

Now, in the new pull request (see #40), I added a new parameter to set the width of the bin. When used, the effect of the x-axis is no longer present. Both the individual and combined plots show the same number of observations per bin.

To the others, let me know if I missed anything...