UM-COG-HSR / BP-COG_Aim2_Dementia

Cox model using time-varying cognition to predict dementia hazard
0 stars 0 forks source link

BPCOG Aim 2 Dementia model #4

Open natilton opened 4 years ago

natilton commented 4 years ago

STUDY

BPCOG Aim 2 - Dementia

STUDY AIM

Create time-to-dementia cox regression model and produce survivor function data.

STUDY POPULATION

Pooled subset of BPCOG cohort participants from ARIC, CHS and FOS

DEPENDENT VARIABLE

The macro code and commented instructions for use are also stored in this repository BP-COG_Aim2_Dementia/Analysis Macros/

Documentation and original files are here https://www.jstatsoft.org/article/view/v061c01

Analysis file: dem.coxreg_currgcp

Model statement for the regression

model (t_start,t_end)*deminc(0) = gcp_bl interval gcp_slope age0 female0 educ racebpcog;

This is how I tried to run the macro:

proc phreg data = dem.coxreg_currgcp outest = ESTS;
class female0 racebpcog educ;       
model (t_start,t_end)*deminc(0) = gcp_bl interval gcp_slope age0 female0 educ racebpcog;
run;

data blcovs;
input age0 female0 educ racebpcog gcp_bl interval gcp_slope;
format female0 sexfmt. educ educ. racebpcog racebpcog.;
datalines;
60 1 5 2 50 1 0
;
run; 

%coxtvc (data = dem.coxreg_currgcp,
                y = (t_start, t_end)*deminc(0),
                x = gcp_bl interval gcp_slope age0 female0 educ racebpcog,
                tvvar = interval gcp_slope,
                nontvvar = gcp_bl age0 female0 educ racebpcog,
                covs = blcovs,
        ests = ests,
                addstmts = %str(class female0 racebpcog educ;));

SAS Log

424  proc phreg data = dem.coxreg_currgcp outest = ESTS;
425  class female0 racebpcog educ;
426  model (t_start,t_end)*deminc(0) = gcp_bl interval gcp_slope age0 female0 educ racebpcog;
427  run;

NOTE: 143 observations were deleted due either to missing or invalid values for the time,
      censoring, frequency or explanatory variables or to invalid operations in generating the
      values for some of the explanatory variables.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The data set WORK.ESTS has 1 observations and 15 variables.
NOTE: PROCEDURE PHREG used (Total process time):
      real time           4:20.49
      cpu time            4:12.64

428
429  data blcovs;
430  input age0 female0 educ racebpcog gcp_bl interval gcp_slope;
431  format female0 sexfmt. educ educ. racebpcog racebpcog.;
432  datalines;

NOTE: The data set WORK.BLCOVS has 1 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.83 seconds
      cpu time            0.01 seconds

434  ;
435  run;

445  %coxtvc(data = dem.coxreg_currgcp,
446                     y = (t_start, t_end)*deminc(0),
447                     x = gcp_bl interval gcp_slope age0 female0 educ racebpcog,
448                     tvvar = interval gcp_slope,
449                     nontvvar = gcp_bl age0 female0 educ racebpcog,
450                     covs = blcovs,
451                     ests = ests,
452                     addstmts = %str(class female0 racebpcog educ;));

NOTE: There were 39282 observations read from the data set DEM.COXREG_CURRGCP.
NOTE: The data set WORK._MEANS_ has 1 observations and 3 variables.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           3.69 seconds
      cpu time            0.06 seconds

NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      9:79
NOTE: There were 1 observations read from the data set WORK._MEANS_.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

NOTE: 80 observations were deleted due either to missing or invalid values for the time,
      censoring, frequency or explanatory variables or to invalid operations in generating the
      values for some of the explanatory variables.
WARNING: Convergence was not attained in 25 iterations.
ERROR: Floating Point Overflow.
ERROR: Termination due to Floating Point Exception
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK._MEANS_ may be incomplete.  When this step was stopped there were 0
         observations and 8 variables.
WARNING: Data set WORK._MEANS_ was not replaced because this step was stopped.
NOTE: PROCEDURE PHREG used (Total process time):
      real time           2:31.05
      cpu time            2:23.15
natilton commented 4 years ago

phreg output 200527.docx

natilton commented 4 years ago

Hi Andrzej,

Jim is trying to hoping to use the model results to be able to calculate cumulative hazard for test cases. I understand how to make SAS produce the survivor function data for test cases, but he's hoping to be able to use predicted hazard to see if the model fits. As far as I know, SAS doesn't produce the baseline hazard function, i.e. h0(t) in hazard = h0(t)*exp(BX)

So I don't see how to calculate predicted values. In searching for a solution, I came across this page regarding the assess statement. https://communities.sas.com/t5/Statistical-Procedures/Estimate-the-baseline-hazard-ratio-and-assess-predictive/td-p/177517

Do you think this would help?

Here's the SAS code for running the procedure with test cases:

%let BPCOG_path = S:\Intmed_Rsrch2\GenMed\Restricted\BP COG;
%let BPCOG_DEMpath = &BPCOG_path.\Aim 2\Dementia Model;
libname storemac "&BPCOG_DEMpath.\SAS Compiled Macros";
libname fmts "&BPCOG_path.\Aim 1\Data Management\Data\formats";
libname dem "&BPCOG_DEMpath.\SAS Data";
options fmtsearch=(fmts) nofmterr mstored sasmstore=storemac;

%macro today_YYMMDD();
%let z=0;
%let y2=%sysfunc(today(),year2.);
%let m2=%sysfunc(today(),month2.);
%let d2=%sysfunc(today(),day2.);
%if %eval(&m2)<=9 %then %let m2 = &z&m2;
%if %eval(&d2)<=9 %then %let d2 = &z&d2;
%let ymd = &y2&m2&d2;
&ymd;
%mend;

%let ymd = %today_YYMMDD();

/*randomly choose 2 participants from each cohort*/
proc sort data=dem.coxreg_currgcp out=currgcp; by newid t_end; run;
data currgcp;
set currgcp;
by newid;
if first.newid;
run;

  PROC SURVEYSELECT DATA=currgcp OUT=aricsamp METHOD=SRS
  SAMPSIZE=2 SEED=1234567;
  where studyname='aric';
  RUN;

  PROC SURVEYSELECT DATA=currgcp OUT=chssamp METHOD=SRS
  SAMPSIZE=2 SEED=1234567;
  where studyname='chs';
  RUN;

  PROC SURVEYSELECT DATA=currgcp OUT=fossamp METHOD=SRS
  SAMPSIZE=2 SEED=1234567;
  where studyname='fos';
  RUN;

data blcovs;
set aricsamp chssamp fossamp;
format female0 yna. educ educ. racebpcog racebpcog.;
keep newid age female0 educ racebpcog gcp_bl gcp_slope;
run;

proc phreg data = dem.coxreg_currgcp;
class female0 (ref="Yes") racebpcog (ref="Non-Hispanic White") 
educ (ref="College graduate or more (Technical School Certificate, Associate Degree, Bachelor's Degree, Graduate or Professional School)");     
model (t_start,t_end)*deminc(0) = gcp_bl gcp_slope age female0 educ racebpcog;
baseline covariates=blcovs out=work.bl_surfunc_&ymd survival=S lower=S_lower upper=S_upper;
run;

/* produce 1-year survival prediction */
data work.bl_testcases_&ymd;
set work.Bl_surfunc_&ymd;
if t_end > 1 and t_end < 1.07;
run;

Thanks, Nick

agalecki commented 4 years ago

Hi Nick,

RE: "calculate cumulative hazard for test cases"

Cumulative hazard H(t) can be derived from survival S(t) using formula S(t) = exp(-H(t)), so this should not be a problem. It can be done within bl_surfunc_&ymd dataset.

To obtain hazard function h(t), we can simply differentiate H(t) with respect to time, i.e. h(t) = dH(t)/dt.

Obtaining hazard function h(t) from Cox regression is problematic, because S(t) and H(t) are not specified. Consequently, they are estimated with step functions and derivatives .wrt time are not meaningful. To circumvent this issue it will be necessary to smooth out H(t) function (e.g. kernel- smoothing) and then calculate numeric derivatives wrt time.

Please note that to accommodate time-varying covariates, e.g. gcp_slope it will be necessary to calculate hazard as follows.

  1. Calculate baseline hazard assuming that time-varying covariate is zero in blcovs data
  2. Adjust baseline hazard estimated in 1. to accommodate time-varying covariate

Hope it helps We may talk, if needed

Andrzej