CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.38k stars 560 forks source link

Cox Regression output, significance stars summary_col in statsmodels #656

Closed sisiphos2018 closed 5 years ago

sisiphos2018 commented 5 years ago

I am trying to set nice regression tables. Several Cox regressions next to each other with significance stars. In Stata I was using estout. In Python I am experimenting with summary_col in statsmodels for OLS.

Is there a good solution for lifelines Cox Regression?

Thank you

CamDavidsonPilon commented 5 years ago

👋 hi there

  1. So lifelines previously had significance stars - the last version that had them was v0.14.6. So you could rollback to that version, but lots of improvements had happened since then so this is my least favourite option. pip install lifelines==v0.14.6

  2. You could write some python code to add them. For example, rewrite the entire print_summary. (check the stars in significance_code below since they may not be convention):

    
    def significance_code(p):
    if p < 0.001:
        return '***'
    elif p < 0.01:
        return '**'
    elif p < 0.05:
        return '*'
    elif p < 0.1:
        return '.'
    else:
        return ' '

def print_summary(cph, decimals=2, **kwargs): """ Print summary statistics describing the fit, the coefficients, and the error bounds.

    Parameters
    -----------
    decimals: int, optional (default=2)
        specify the number of decimal places to show
    kwargs:
        print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing
        multiple outputs.

    """

    # Print information about data first
    justify = string_justify(18)
    print(cph)
    print("{} = '{}'".format(justify("duration col"), cph.duration_col))
    if cph.event_col:
        print("{} = '{}'".format(justify("event col"), cph.event_col))
    if cph.weights_col:
        print("{} = '{}'".format(justify("weights col"), cph.weights_col))

    if cph.cluster_col:
        print("{} = '{}'".format(justify("cluster col"), cph.cluster_col))

    if cph.robust or cph.cluster_col:
        print("{} = {}".format(justify("robust variance"), True))

    if cph.strata:
        print("{} = {}".format(justify("strata"), cph.strata))

    if cph.penalizer > 0:
        print("{} = {}".format(justify("penalizer"), cph.penalizer))

    print("{} = {}".format(justify("number of subjects"), cph._n_examples))
    print("{} = {}".format(justify("number of events"), cph.event_observed.sum()))
    print("{} = {:.{prec}f}".format(justify("log-likelihood"), cph._log_likelihood, prec=decimals))
    print("{} = {}".format(justify("time fit was run"), cph._time_fit_was_called))

    for k, v in kwargs.items():
        print("{} = {}\n".format(justify(k), v))

    print(end="\n")
    print("---")

    df = cph.summary
    # Significance codes as last column
    df[""] =  [significance_code(p) for p in df['p']]
    print(df.to_string(float_format=format_floats(decimals), formatters={"p": format_p_value(decimals)}))

    # Significance code explanation
    print("---")
    print("Concordance = {:.{prec}f}".format(cph.score_, prec=decimals))
    print(
        "Log-likelihood ratio test = {:.{prec}f} on {} df, -log2(p)={:.{prec}f}".format(
            *cph._compute_likelihood_ratio_test(), prec=decimals
        )
    )


3. My preferred solution is to think of another way of expressing significance altogether, without stars. I removed the stars for a reason: https://twitter.com/Cmrn_DP/status/1072951755028103168. I've started including the log2(p) values since they are easier to reason about: https://lesslikely.com/statistics/s-values/