CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.34k stars 553 forks source link

interesting posts on simulation #360

Open CamDavidsonPilon opened 6 years ago

CamDavidsonPilon commented 6 years ago

https://stats.stackexchange.com/questions/105881/how-to-simulate-survival-times-using-true-base-line-hazard-function

CamDavidsonPilon commented 6 years ago

https://github.com/faithghlee/SurvivalDataSimulation/blob/master/survival-data-simulation.pdf

pzivich commented 6 years ago

I wrote some code last week do simulate some survival time data. It could probably be improved but it's something

import numpy as np 
import pandas as pd 

def data_generator(n=10000,seed=101):
    np.random.seed(seed=seed) #Setting seed
    df = pd.DataFrame(index=range(n)) #generating empty data frame with n rows
    df['eb'] = 1120176000 #Unix date time for start
    df['de'] = np.random.uniform(size=len(df))*450000000 #adding a random normal distribution
    df['dob'] = pd.to_datetime(df['eb'] + df['de'],unit='s').dt.date #converting to datetime
    df['z0'] = np.random.binomial(n=1,p=0.7,size=len(df)) #Simulating baseline confounder
    df['pa0'] = 1 / (1 + np.exp( -(np.log(1) - 4*0.7 + 4*df['z0']))) #Simulating probability for exposure
    df['a0'] = np.random.binomial(n=1,p=df['pa0'],size=len(df)) #Simulating binary exposure based on probability 
    df['predy'] = 18 + 10*df['a0'] - 9*df['z0'] + 0*df['a0']*df['z0'] #Simulating probability of outcome
    df['t'] = np.random.weibull(2,size=len(df)) * df['predy'] #Setting survival times from Weibull dist
    df['t2'] = np.where(df['t']>5,5,df['t']) #Administrative censoring time is 5
    df['delta'] = np.where(df['t']>5,0,1) #Event indicator if event happened before time 5
    return df