BiomedSciAI / causallib

A Python package for modular causal inference analysis and model evaluations
Apache License 2.0
728 stars 97 forks source link

Hi. ask a question. #69

Closed agnusdei13 closed 3 months ago

agnusdei13 commented 4 months ago

Thank you for a wonderful package for ipw analysis. Using the causallib package, I analyzed survival outcomes for cancer patient data. But, there is no statistical method to show the difference of survival between two groups. For example, I can only know the difference of the survivial rate at the time-end point. I searched the methods like recontruction of time-table from survival curves, but these methods depends on the size of risk of patients. Can you please let me know any method to show the statistical difference?

ehudkr commented 3 months ago

Hi, thanks for your kind words. I apologize for my late reply, I was occupied more by work lately.

It's not clear to me what exactly do you mean by "show the difference between two groups"? causallib allows you to estimate the counterfactual survival curve for each treatment group, which allows you to take any contrast of survival (e.g., survival difference or survival ratio) at any point along the survival curves (even at every time point). This will give you a point estimate. If you want proper inference and uncertainty estimates, it is best to just bootstrap the entire process (see section 5 / figure 1 in Austin 2016).

It is also not clear what "reconstruction of time-table from survival curves" is and I was not able to find anything material I deemed relevant for statistical differences on google. could you please elaborate more so I'll try to help?

agnusdei13 commented 3 months ago

Thank you for your response. In medical research, the log-rank test is often used to measure the survival difference between two groups. And the calculation is based on the differences in observed and expected death between two groups. You can check the method below reference(1) KM survival analysis The method is based on the number of patients either alive or dead, the results depend on the number of patients(N). I understand your method using iptw and logistic regression for predicting the survival at any point of time including the iteration, but is there any way to get the log-rank test or any other way to know the number of peudo-population or total number of patients? Because lots of medical researchers still use the log-rank tests to present the treatment effect or survival difference.

A reconstruction of time-table from survival curves is the method published. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3313891/

1) KM survival analysis https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059453/

2) adjusted ipw km analysis https://www.sciencedirect.com/science/article/pii/S0169260703001378?via%3Dihub#aep-section-id8

2024년 7월 21일 (일) 오후 6:35, Ehud Karavani @.***>님이 작성:

Hi, thanks for your kind words. I apologize for my late reply, I was occupied more by work lately.

It's not clear to me what exactly do you mean by "show the difference between two groups"? causallib allows you to estimate the counterfactual survival curve for each treatment group, which allows you to take any contrast of survival (e.g., survival difference or survival ratio) at any point along the survival curves (even at every time point). This will give you a point estimate. If you want proper inference and uncertainty estimates, it is best to just bootstrap the entire process (see section 5 / figure 1 in Austin 2016 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157758#sim7084-fig-0001).

It is also not clear what "reconstruction of time-table from survival curves" is and I was not able to find anything material I deemed relevant for statistical differences on google. could you please elaborate more so I'll try to help?

— Reply to this email directly, view it on GitHub https://github.com/BiomedSciAI/causallib/issues/69#issuecomment-2241544386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDBADA35PCX3JJMTWOLHCDZNN6EJAVCNFSM6AAAAABJRUFB2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRGU2DIMZYGY . You are receiving this because you authored the thread.Message ID: @.***>

ehudkr commented 3 months ago

I see, if you want a covariate-adjusted log-rank test there's no need to reconstruct back the dataset from survival curves as you already have the data, all you need is to perform an inverse probability weighted log-rank test - which is like performing the log-rank test on the pseudo population in which everyone are either treated or untreated.

Assuming X is your covariates/confounders, a is the treatment assignment, t is the time to event, y is the event indicator, and ipw is an already initialized IPW model from causallib (can be pulled from the WeightedSurvival model you used for the survival curves), you can calculate an IP-weighted log-rank test using lifelines:

from lifelines.statistics import logrank_test

w = ipw.compute_weights(X, a)

untreated = a == 0
durations_A = t[untreated]
durations_B = t[~untreated]
event_observed_A = y[untreated]
event_observed_B = y[~untreated]
weights_A = w[untreated]
weights_B = w[~untreated]

res = logrank_test(
    durations_A, durations_B,
    event_observed_A, event_observed_B,
    weights_A=weights_A,
    weights_B=weights_B,
)

Hope that helps.


More generally, please note there are recent trends to shift away from log-rank tests towards Cox regression-based statistics, which you may want to consider: https://discourse.datamethods.org/t/when-is-log-rank-preferred-over-univariable-cox-regression/2344/2?u=ehudk

agnusdei13 commented 3 months ago

Thank you so much! I really appreciated your help!

Sincerely

2024년 7월 23일 (화) 오후 3:34, Ehud Karavani @.***>님이 작성:

I see, if you want a covariate-adjusted log-rank test there's no need to reconstruct back the dataset from survival curves as you already have the data, all you need is to perform an inverse probability weighted log-rank test - which is like performing the log-rank test on the pseudo population in which everyone are either treated or untreated.

Assuming X is your covariates/confounders, a is the treatment assignment, t is the time to event, y is the event indicator, and ipw is an already initialized IPW model from causallib (can be pulled from the WeightedSurvival model you used for the survival curves), you can calculate an IP-weighted log-rank test using lifelines https://lifelines.readthedocs.io/en/latest/lifelines.statistics.html#lifelines.statistics.logrank_test :

from lifelines.statistics import logrank_test w = ipw.compute_weights(X, a) untreated = a == 0durations_A = t[untreated]durations_B = t[~untreated]event_observed_A = y[untreated]event_observed_B = y[~untreated]weights_A = w[untreated]weights_B = w[~untreated] res = logrank_test( durations_A, durations_B, event_observed_A, event_observed_B, weights_A=weights_A, weights_B=weights_B, )

Hope that helps.

More generally, please note there are recent trends to shift away from log-rank tests towards Cox regression-based statistics, which you may want to consider:

https://discourse.datamethods.org/t/when-is-log-rank-preferred-over-univariable-cox-regression/2344/2?u=ehudk

— Reply to this email directly, view it on GitHub https://github.com/BiomedSciAI/causallib/issues/69#issuecomment-2244360587, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDBADCMDCHD6VRFEHVSV7LZNX2PTAVCNFSM6AAAAABJRUFB2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBUGM3DANJYG4 . You are receiving this because you authored the thread.Message ID: @.***>