JuliaStats / GLM.jl

Generalized linear models in Julia
Other
584 stars 114 forks source link

Feature Request: Better formats for the GLM outputs #543

Closed itsdebartha closed 1 year ago

itsdebartha commented 1 year ago

I noticed that the outputs of GLM might be a bit confusing, mainly due to the fact that the number of digits after the decimal is not fixed. Consider the below case:

Coefficients:
────────────────────────────────────────────────────────────────────────────────────
                        Coef.  Std. Error       z  Pr(>|z|)   Lower 95%    Upper 95%
────────────────────────────────────────────────────────────────────────────────────
age                   265.742     13.6073   19.53    <1e-84     239.072     292.412
sex: female        -12421.8     1153.82    -10.77    <1e-26  -14683.3    -10160.4
sex: male          -12754.0     1167.79    -10.92    <1e-27  -15042.8    -10465.1
bmi                   346.191     32.9237   10.51    <1e-25     281.662     410.72
children              566.056    156.284     3.62    0.0003     259.746     872.367
smoker: yes         24129.1      472.532    51.06    <1e-99   23203.0     25055.2
region: northwest    -494.076    549.712    -0.90    0.3688   -1571.49      583.34
region: southeast   -1217.09     557.101    -2.18    0.0289   -2308.99     -125.191
region: southwest   -1111.09     548.291    -2.03    0.0427   -2185.72      -36.4612
────────────────────────────────────────────────────────────────────────────────────

I was suggesting, how about fixing the number of digits after the decimal to a certain value, maybe based on the max number of digits in the output. The others could be filled out with 0's. Something like:

Coefficients:
────────────────────────────────────────────────────────────────────────────────────
                        Coef.  Std. Error       z  Pr(>|z|)   Lower 95%    Upper 95%
────────────────────────────────────────────────────────────────────────────────────
age                   265.742     13.6073   19.53    <1e-84     239.072     292.4120
sex: female        -12421.800   1153.8200  -10.77    <1e-26  -14683.300  -10160.4000
sex: male          -12754.000   1167.7900  -10.92    <1e-27  -15042.800  -10465.1000
bmi                   346.191     32.9237   10.51    <1e-25     281.662     410.7200
children              566.056    156.2840    3.62    0.0003     259.746     872.3670
smoker: yes         24129.100    472.5320   51.06    <1e-99   23203.000   25055.2000
region: northwest    -494.076    549.7120   -0.90    0.3688   -1571.490     583.3400
region: southeast   -1217.090    557.1010   -2.18    0.0289   -2308.990    -125.1910
region: southwest   -1111.090    548.2910   -2.03    0.0427   -2185.720     -36.4612
────────────────────────────────────────────────────────────────────────────────────

According to me, this looks a bit more ordered than the previous one.

ararslan commented 1 year ago

The printing of the table of coefficients is actually not controlled by GLM but rather by the show method for CoefTable in StatsBase. It uses a couple of internal types, PValue and TestStat, that control the fixed-size printing of p-values and test statistics, respectively. The remaining numbers are printed in a way modeled after how Base prints arrays: using vertical decimal alignment with "compact" printing of numbers (fewer decimal digits).

ParadaCarleton commented 1 year ago

@itsdebartha can you make an issue for this in StatsBase.jl?

itsdebartha commented 1 year ago

@itsdebartha can you make an issue for this in StatsBase.jl?

Sure