CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.35k stars 558 forks source link

Generic ParametricUnivariate mixture cure model #930

Closed bluemoo closed 4 years ago

bluemoo commented 4 years ago

Would it be of interest to have a ParametricUnivariateFitter that adds a 'cure' component to a specified non-cure model? This could operate in a manner similar to the flexsurvreg R package - wrapping the functions of some baseline fitter. I suggest it because I've found such a thing useful in my own work, when selecting which underlying model to use.

Here is a quickly put together example of what I mean. If there is interest, I'd be willing to get into a better shape, with guidance.

class  MixtureCureModel(ParametricUnivariateFitter):
        def __init__(self, base_fitter, *args, **kwargs):
            self._base_fitter = base_fitter
            self._fitted_parameter_names = ["theta__"] + base_fitter._fitted_parameter_names
            self._bounds = [(0.0, 1.0)] + base_fitter._bounds
            super().__init__(*args, **kwargs)

        def _cumulative_hazard(self, params, times):
            theta__, lambda_, k_ = params
            sf = self._base_fitter._survival_function(params[1:], times)
            return -np.log(theta__ + (1 - theta__)*sf)

        def _create_initial_point(self, Ts, E, *args):
            self._base_fitter._censoring_type = self._censoring_type
            base_point = self._base_fitter._create_initial_point(Ts, E, *args)
            return np.array([.5] + list(base_point))
CamDavidsonPilon commented 4 years ago

Yesss! This is a really cool idea.

CamDavidsonPilon commented 4 years ago

So, I'm stuck on what the API should look like. I really like the idea of making it trivial to "layer" in a cure model into an existing model.

bluemoo commented 4 years ago

Here are all the versions I've considered, with the assumption that long term you'll want to parallel the capabilities of flexsurvcure, where they make sense. I think you've been trying to avoid having any mapping between string parameters and fitters, so there won't be anything like dist='weibull'.

1. Class builder function: This would parallel the way that the sklearn_adapter function works. It takes a non-cure class and a few parameters, and returns a new class that the user then can instantiate.

def cure_model(base_fitter_klass: ParametricUnivariateFitter, link=expit, mixture=True) -> Type['MixtureCureFitter']:
        ...
        return new_klass

MixtureCureLogLogistic = cure_model(LogLogisticFitter, link=lambda x: exp(-exp(x)), mixture=True)
fitter = CureLogLogistic()

2. A Cure Model class with passed instance: A pair of MixtureCureModelFitter and NonMixtureCureModelFitter classes that take instances of a ParametricUnivariateFitter and are implemented in terms of that fitter. Changing things like the cure parameter link function can just be done via subclassing. This has the advantage of totally decoupling how the cure model portion and the base fitter are instantiated. I think it's probably the easiest to implement approach, but that's just a guess. It's what I've actually used for my own work.

class  MixtureCureModelFitter(ParametricUnivariateFitter):
        def __init__(self, base_fitter, *args, **kwargs):
                self._base_fitter = base_fitter
                ...
                super().__init__(*args, **kwargs)

fitter = MixtureCureModelFitter(LogLogisticFitter())

3. A Cure Model Fitter class with passed class and arguments: A pair of MixtureCureModelFitter and NonMixtureCureModelFitter classes that have a parameter which takes a type that is a subclass of ParametricUnivariateFitter and any needed arguments. Then it instantiates the class itself as needed. I think this is probably a more awkward interface than the previous approach, but there is a lot of state stored in the fitters, and it would allow the Fitter to 'reset' the base_fitter if that becomes necessary.

class  MixtureCureModelFitter(ParametricUnivariateFitter):
        def __init__(self, base_fitter, *args, **kwargs):
                self._base_fitter = base_fitter
                ...
                super().__init__(*args, **kwargs)

fitter = MixtureCureModel(LogLogisticFitter())

4. Split apart fitters and equations: This is a more aggressive solution that involves a big refactoring of...everything. The portion of a class which represents the survival model equations could be separated from the actual mechanics of fitting the data. Then the equation for the non-cure portion could be wrapped in what I would expect is a straightforward way. I think this is closer to how the R packages do it, but I'm not sure that makes it correct.

CamDavidsonPilon commented 4 years ago

Thanks for providing examples, @bluemoo! From what you provided, I think 2. is the best option. If you have something working, feel free to make a PR and we can work from there =)

A bit aside, but your post made me think of lifelines 1.0.

In my head, I'm thinking about how far we could abstract the ideas above. For example, how could we allow users to create more sophisticated models with cures, competing risks, components (like in reliability analysis), etc.

But then I get stuck on:

bluemoo commented 4 years ago

PR https://github.com/CamDavidsonPilon/lifelines/pull/947 shows what option 2 could look like.

There are a lot of outstanding questions and things to do:

Regarding your aside about lifelines 1.0: I think you could at least abstract out some of the pieces, even if only to use them internally and keep the current public interface. Then more advanced users could take those pieces and do what they want with them. Alternatively, you may want to consider what you want your audience to be. If it is strong programmers who are not as well versed in the mathematics of survival analysis, it may be hard for them to take advantage of things like the customer parmetric regression models, and they would be well served by building blocks that abstract away the actual math.

bluemoo commented 4 years ago

I think the link function should be a parameter on the constructor that defaults to expit.

class MixtureCureFitter(..., link=expit)

bluemoo commented 4 years ago

Actually, that by itself doesn't work well. The allowed parameter bounds and default value should probably depend on the specific link function to use (or some kind of future smart default would need to have an inverse function). Maybe the way sklearn scorer specification works would be a reasonable model: allow either a string to use something predefined, or allow an object with a specific interface. Maybe something as simple as a namedtuple that gives the desired function, its inverse, and default bounds.

For reference, here's the code for R/flexsurvcure (link functions are dealt with about halfway down): https://rdrr.io/cran/flexsurvcure/src/R/flexsurvcure.R