events going missing when I use add_covariate_to_timeline()

DavidSorge commented 3 years ago

I'm working on a time-varying cox regression analysis. I'm attempting to add time varying covariates to a dataset in long format, following the examples here as closely as possible.

When using the add_covariate_to_timeline() function, however, all the time periods in the resultant dataset come back with False in the event column.

covariate_add_problems

For reference, here's the base df:

base_df

and here's the covariate df:

covariate_df

I'm wondering if it has to do with the covariate df being a 'cumulative event' type df? Not sure, so I thought I'd ask.

I'm using lifelines version 0.26.3

Thanks so much!

CamDavidsonPilon commented 3 years ago

Hi @DavidSorge - are you able to supply a small sample dataframe that replicates the problem? That would help debug this.

DavidSorge commented 3 years ago

Yes, that should be pretty easy.

Here's a code snippet that reproduces the problem:

base_df_sample = pd.read_csv('base_df_sample.csv', index_col=0)
covariate_sample = pd.read_csv('covariate_sample.csv')

print('Before merge:')
print(base_sample.event.value_counts())
print()

integrated = add_covariate_to_timeline(
    base_df_sample,
    covariate_sample,
    duration_col='start_day',
    id_col='PC_ID',
    event_col='event'
)

print('After merge:')
print(integrated.event.value_counts())

The sample files are attached. They are from two of the geographical units that have the most events (in this case, incidents of unrest). Being incidents of unrest, the events in my model are non-absorbing, (ie, a single district can experience multiple events).

The csv files with the snippets are attached here:

base_df_sample.csv

covariate_sample.csv

The output I get from running the snippet is:

Before merge:
True     41
False     2
Name: event, dtype: int64

After merge:
False    1420
Name: event, dtype: int64

DavidSorge commented 3 years ago

(Also, for the record, thank you for both (a) a great tool--I'm really grateful to be able to do my Survival Analysis in python rather than having to jump over to R-- and (b) your amazingly fast response time--right after I initially posted the issue!

CamDavidsonPilon / lifelines

events going missing when I use add_covariate_to_timeline() #1343