Cox Regression output, significance stars summary_col in statsmodels

👋 hi there

So lifelines previously had significance stars - the last version that had them was v0.14.6. So you could rollback to that version, but lots of improvements had happened since then so this is my least favourite option. pip install lifelines==v0.14.6

You could write some python code to add them. For example, rewrite the entire print_summary. (check the stars in significance_code below since they may not be convention):


def significance_code(p):
if p < 0.001:
    return '***'
elif p < 0.01:
    return '**'
elif p < 0.05:
    return '*'
elif p < 0.1:
    return '.'
else:
    return ' '

def print_summary(cph, decimals=2, **kwargs): """ Print summary statistics describing the fit, the coefficients, and the error bounds.

    Parameters
    -----------
    decimals: int, optional (default=2)
        specify the number of decimal places to show
    kwargs:
        print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing
        multiple outputs.

    """

    # Print information about data first
    justify = string_justify(18)
    print(cph)
    print("{} = '{}'".format(justify("duration col"), cph.duration_col))
    if cph.event_col:
        print("{} = '{}'".format(justify("event col"), cph.event_col))
    if cph.weights_col:
        print("{} = '{}'".format(justify("weights col"), cph.weights_col))

    if cph.cluster_col:
        print("{} = '{}'".format(justify("cluster col"), cph.cluster_col))

    if cph.robust or cph.cluster_col:
        print("{} = {}".format(justify("robust variance"), True))

    if cph.strata:
        print("{} = {}".format(justify("strata"), cph.strata))

    if cph.penalizer > 0:
        print("{} = {}".format(justify("penalizer"), cph.penalizer))

    print("{} = {}".format(justify("number of subjects"), cph._n_examples))
    print("{} = {}".format(justify("number of events"), cph.event_observed.sum()))
    print("{} = {:.{prec}f}".format(justify("log-likelihood"), cph._log_likelihood, prec=decimals))
    print("{} = {}".format(justify("time fit was run"), cph._time_fit_was_called))

    for k, v in kwargs.items():
        print("{} = {}\n".format(justify(k), v))

    print(end="\n")
    print("---")

    df = cph.summary
    # Significance codes as last column
    df[""] =  [significance_code(p) for p in df['p']]
    print(df.to_string(float_format=format_floats(decimals), formatters={"p": format_p_value(decimals)}))

    # Significance code explanation
    print("---")
    print("Concordance = {:.{prec}f}".format(cph.score_, prec=decimals))
    print(
        "Log-likelihood ratio test = {:.{prec}f} on {} df, -log2(p)={:.{prec}f}".format(
            *cph._compute_likelihood_ratio_test(), prec=decimals
        )
    )



3. My preferred solution is to think of another way of expressing significance altogether, without stars. I removed the stars for a reason: https://twitter.com/Cmrn_DP/status/1072951755028103168. I've started including the log2(p) values since they are easier to reason about: https://lesslikely.com/statistics/s-values/

CamDavidsonPilon / lifelines

Cox Regression output, significance stars summary_col in statsmodels #656