BPCOG Aim 2 - Dementia


Create time-to-dementia cox regression model and produce survivor function data.


Pooled subset of BPCOG cohort participants from ARIC, CHS and FOS


The macro code and commented instructions for use are also stored in this repository BP-COG_Aim2_Dementia/Analysis Macros/

Documentation and original files are here

Analysis file: dem.coxreg_currgcp

Model statement for the regression

model (t_start,t_end)*deminc(0) = gcp_bl interval gcp_slope age0 female0 educ racebpcog;

This is how I tried to run the macro:

proc phreg data = dem.coxreg_currgcp outest = ESTS;
class female0 racebpcog educ;       
model (t_start,t_end)*deminc(0) = gcp_bl interval gcp_slope age0 female0 educ racebpcog;

data blcovs;
input age0 female0 educ racebpcog gcp_bl interval gcp_slope;
format female0 sexfmt. educ educ. racebpcog racebpcog.;
60 1 5 2 50 1 0

%coxtvc (data = dem.coxreg_currgcp,
                y = (t_start, t_end)*deminc(0),
                x = gcp_bl interval gcp_slope age0 female0 educ racebpcog,
                tvvar = interval gcp_slope,
                nontvvar = gcp_bl age0 female0 educ racebpcog,
                covs = blcovs,
        ests = ests,
                addstmts = %str(class female0 racebpcog educ;));


natilton commented 4 years ago

natilton commented 4 years ago

Hi Andrzej,

Jim is trying to hoping to use the model results to be able to calculate cumulative hazard for test cases. I understand how to make SAS produce the survivor function data for test cases, but he's hoping to be able to use predicted hazard to see if the model fits. As far as I know, SAS doesn't produce the baseline hazard function, i.e. h0(t) in hazard = h0(t)*exp(BX)

So I don't see how to calculate predicted values. In searching for a solution, I came across this page regarding the assess statement.

Do you think this would help?

Here's the SAS code for running the procedure with test cases:

%let BPCOG_path = S:\Intmed_Rsrch2\GenMed\Restricted\BP COG;
%let BPCOG_DEMpath = &BPCOG_path.\Aim 2\Dementia Model;
libname storemac "&BPCOG_DEMpath.\SAS Compiled Macros";
libname fmts "&BPCOG_path.\Aim 1\Data Management\Data\formats";
libname dem "&BPCOG_DEMpath.\SAS Data";
options fmtsearch=(fmts) nofmterr mstored sasmstore=storemac;

%macro today_YYMMDD();
%let z=0;
%let y2=%sysfunc(today(),year2.);
%let m2=%sysfunc(today(),month2.);
%let d2=%sysfunc(today(),day2.);
%if %eval(&m2)<=9 %then %let m2 = &z&m2;
%if %eval(&d2)<=9 %then %let d2 = &z&d2;
%let ymd = &y2&m2&d2;

%let ymd = %today_YYMMDD();

/*randomly choose 2 participants from each cohort*/
proc sort data=dem.coxreg_currgcp out=currgcp; by newid t_end; run;
data currgcp;
set currgcp;
by newid;
if first.newid;

  SAMPSIZE=2 SEED=1234567;
  where studyname='aric';

  SAMPSIZE=2 SEED=1234567;
  where studyname='chs';

  SAMPSIZE=2 SEED=1234567;
  where studyname='fos';

data blcovs;
set aricsamp chssamp fossamp;
format female0 yna. educ educ. racebpcog racebpcog.;
keep newid age female0 educ racebpcog gcp_bl gcp_slope;

proc phreg data = dem.coxreg_currgcp;
class female0 (ref="Yes") racebpcog (ref="Non-Hispanic White") 
educ (ref="College graduate or more (Technical School Certificate, Associate Degree, Bachelor's Degree, Graduate or Professional School)");     
model (t_start,t_end)*deminc(0) = gcp_bl gcp_slope age female0 educ racebpcog;
baseline covariates=blcovs out=work.bl_surfunc_&ymd survival=S lower=S_lower upper=S_upper;

/* produce 1-year survival prediction */
data work.bl_testcases_&ymd;
set work.Bl_surfunc_&ymd;
if t_end > 1 and t_end < 1.07;

Thanks, Nick

agalecki commented 4 years ago

Hi Nick,

RE: "calculate cumulative hazard for test cases"

Cumulative hazard H(t) can be derived from survival S(t) using formula S(t) = exp(-H(t)), so this should not be a problem. It can be done within bl_surfunc_&ymd dataset.

To obtain hazard function h(t), we can simply differentiate H(t) with respect to time, i.e. h(t) = dH(t)/dt.

Obtaining hazard function h(t) from Cox regression is problematic, because S(t) and H(t) are not specified. Consequently, they are estimated with step functions and derivatives .wrt time are not meaningful. To circumvent this issue it will be necessary to smooth out H(t) function (e.g. kernel- smoothing) and then calculate numeric derivatives wrt time.

Please note that to accommodate time-varying covariates, e.g. gcp_slope it will be necessary to calculate hazard as follows.

  1. Calculate baseline hazard assuming that time-varying covariate is zero in blcovs data
  2. Adjust baseline hazard estimated in 1. to accommodate time-varying covariate

Hope it helps We may talk, if needed
