arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.58k stars 394 forks source link

Reformatting the Highest Density Interval (HDI) result on the posterior plot from percentage to float #1698

Open serdarsemih opened 3 years ago

serdarsemih commented 3 years ago

I conducted a Bayesian mean difference test and obtained the posterior plot of the parameter estimation. I adjusted the HDI probability to 0.995; however, the plot_posterior function of Arviz rounds the probability value to 100% when displaying on the plot, as seen in the following figure. I need that plot to display 99.5%, which is the exact value of credible interval. Although the "round_to" argument allows controlling the formatting of floats, it didn't work for adjusting the HDI percentage. I think ArViz should allow this, because social science is more prone to reset the statistical significance from 0.05 to 0.005 in close future. Please, see Benjamin's et al.(2018) comment.

mdiff

Reference Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Johnson, V. E. (2018). Redefine statistical significance. Nature human behaviour, 2(1), 6-10.

OriolAbril commented 3 years ago

Should we have round_to and format_sig_figures and other number formatting functions be provided by the labeller? And then allow different rounding options for every value/type of value?

serdarsemih commented 3 years ago

In my case, when I set the ‘hdi_prob’ parameter of the plot_posterior function as equal to 0.995, the plot should display 99.5% instead of 100%. I think the percentage is the best display to concise the credible interval, so there might be no need for other value types. The ‘round_to’ parameter also works well.

OriolAbril commented 3 years ago

We currently have only one round_to argument, which is used by the hdi endpoints and for the point estimate but not for the hdi percentage, and I think this is good, we can easily need 3-5 digits for point estimates and hdi limits, but we'll generally need 0-1 decimal places for the hdi percentage. Having the decimal places of the hdi percentage hardcoded to 0 is not good, and we should change it, but I think that changing it to being round_to is not the solution. By extending the things handled by the labeller, we can have different formats applied to probs (even chose between prob or percentage) and to values.

serdarsemih commented 3 years ago

That would be great if labeller provides those options. Many thanks.