2016-08-24-Dealing with MisClassification

IALSA / ialsa-2016-amsterdam

Multi-study and multivariate evaluation of healthy life expectancy (HLE): An IALSA workshop on multistate modeling using R

GNU General Public License v2.0

0 stars 0 forks source link

2016-08-24-Dealing with MisClassification #29

Open andkov opened 7 years ago

andkov commented 7 years ago

Currently, my models of MAP data cannot deal with backward transition by the means of misspecification in matrix E .

> # clean data after encoding multistates
> # each person has at least two data points
> # and non-missing age at all time points
> ds <- ds_clean %>% 
+   dplyr::mutate(
+     male = as.numeric(male),
+     age    = age - 80,
+     age_bl = age_bl - 80
+   ) #%>% 
>   # dplyr::select(id, male, age_bl, edu, educat, age, state)
> head(ds)
     id      age_bl male edu         age state presumed_alive educat   educatF firstobs
1  9121 -0.03011636    0  12 -0.03011636     1           TRUE      1 >11 years        1
2  9121 -0.03011636    0  12  1.08145106     1           TRUE      1 >11 years        0
3  9121 -0.03011636    0  12  1.61259411     1           TRUE      1 >11 years        0
4  9121 -0.03011636    0  12  2.59548255     1           TRUE      1 >11 years        0
5  9121 -0.03011636    0  12  3.62217659     1           TRUE      1 >11 years        0
6 33027  1.00752909    0  14  1.00752909     1           TRUE      1 >11 years        1
> # state frequencies
> table(ds$state)

  -2   -1    1    2    3    4 
  78  142 6629 1584 1155  680 
> # observed transitions
> knitr::kable(msm::statetable.msm(state,id,ds_ms))
|   | -2| -1|    1|   2|   3|   4|
|:--|--:|--:|----:|---:|---:|---:|
|-2 | 32|  0|    0|   0|   0|   0|
|-1 |  0| 25|   27|  13|  27|  50|
|1  | 32| 59| 4855| 715| 120| 251|
|2  |  8| 20|  534| 478| 257| 146|
|3  |  6| 34|   24|  97| 649| 233|
> # define matrix with starting values for transition
> q <- .01
> Q <- rbind( c(0,  q,  q,  q), 
+             c(q,  0,  q,  q),
+             c(0,  q,  0,  q), 
+             c(0,  0,  0,  0)) 
> # misclassification matrix
> E <- rbind( c(0,  0,  0,  0),  
+             c(0,  0,  0,  0), 
+             c(0,  0,  0,  0),
+             c(0,  0,  0,  0) )
> # transition names
> qnames = c(
+   "Healthy - Mild",  # q12
+   "Health - Severe", # q13
+   "Healthy - Dead",  # q14
+   "Mild - Healthy",  # q21  
+   "Mild - Severe",   # q23
+   "Mild - Dead",     # q24
+   "Severe - Mild",   # q32
+   "Severe - Dead"    # q34
+ )
> # set estimation options 
> digits = 2
> cov_names  = "age"   # string with covariate names
> method_    = "BFGS"  # alternatively, if does not converge "Nedler-Mead"
> constraint = NULL    # additional model constraints
> fixedpars  = NULL    # fixed parameters
> # construct covariate list
> covariates <- as.formula(paste0("~",cov_names))
> # estimate model
> model_A <- msm(
+   formula       = state ~ age, 
+   subject       = id, 
+   data          = ds, 
+   center        = FALSE,
+   qmatrix       = Q, 
+   ematrix       = E,
+   death         = TRUE, 
+   covariates    = covariates,
+   censor        = c(-1,-2), 
+   censor.states = list(c(1,2,3), c(1,2,3)), 
+   method        = method_,
+   constraint    = constraint,
+   fixedpars     = fixedpars,
+   control       = list(trace=0,REPORT=1,maxit=1000,fnscale=10000)
+ )
> examine_multistate(model_A)
---------------------------------------
Model ---  with covariates: ~age
and constraints:
NULL
and fixedpars:
named integer(0)
Convergence code = 0 

-2loglik = 12972.04 
AIC = 13004.04 

Parameter estimats and SEs:
--------------------------------------- 
                 q      p    se Wald.ChiSq Pr.ChiSq
1   Healthy - Mild -1.717 0.052   1090.270    0.000
2  Health - Severe -6.763 1.071     39.875    0.000
3   Healthy - Dead -3.712 0.123    910.764    0.000
4   Mild - Healthy -0.248 0.058     18.283    0.000
5    Mild - Severe -1.285 0.091    199.399    0.000
6      Mild - Dead -2.843 0.286     98.815    0.000
7    Severe - Mild -1.774 0.131    183.385    0.000
8    Severe - Dead -1.849 0.127    211.966    0.000
9   Healthy - Mild  0.080 0.006    177.778    0.000
10 Health - Severe -0.097 0.074      1.718    0.190
11  Healthy - Dead  0.082 0.015     29.884    0.000
12  Mild - Healthy -0.018 0.008      5.062    0.024
13   Mild - Severe  0.047 0.010     22.090    0.000
14     Mild - Dead  0.064 0.028      5.224    0.022
15   Severe - Mild -0.008 0.014      0.327    0.568
16   Severe - Dead  0.065 0.011     34.917    0.000
Warning messages:
1: In wald[1:sum(names(p) == "qbase")] <- wald :
  number of items to replace is not a multiple of replacement length
2: In pvalue[1:sum(names(p) == "qbase")] <- pvalue :
  number of items to replace is not a multiple of replacement length

However, after adjusting the E matrix to describe transition from 3 to 1 as model misspecification, msm() call generates the following error:

> E <- rbind( c(0,  0,  0,  0),  
+             c(0,  0,  0,  0), 
+             c(.1,  0,  0,  0),
+             c(0,  0,  0,  0) )
> model_B <- msm(
+   formula       = state ~ age, 
+   subject       = id, 
+   data          = ds, 
+   center        = FALSE,
+   qmatrix       = Q, 
+   ematrix       = E,
+   death         = TRUE, 
+   covariates    = covariates,
+   censor        = c(-1,-2), 
+   censor.states = list(c(1,2,3), c(1,2,3)), 
+   method        = method_,
+   constraint    = constraint,
+   fixedpars     = fixedpars,
+   control       = list(trace=0,REPORT=1,maxit=1000,fnscale=10000)
+ )
Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1,  : 
  initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)

andkov commented 7 years ago

When q31 is not fixed to 0 and without misspecification the model estimates just fine:

> # define matrix with starting values for transition
> q <- .01
> Q <- rbind( c(0,  q,  q,  q), 
+             c(q,  0,  q,  q),
+             c(q,  q,  0,  q), 
+             c(0,  0,  0,  0)) 
> # misclassification matrix
> E <- rbind( c(0,  0,  0,  0),  
+             c(0,  0,  0,  0), 
+             c(0,  0,  0,  0),
+             c(0,  0,  0,  0) )
> # transition names
> qnames = c(
+   "Healthy - Mild",  # q12
+   "Health - Severe", # q13
+   "Healthy - Dead",  # q14
+   "Mild - Healthy",  # q21  
+   "Mild - Severe",   # q23
+   "Mild - Dead",     # q24
+   "Severe - Healthy",# q31
+   "Severe - Mild",   # q32
+   "Severe - Dead"    # q34
+ )
> # set estimation options 
> digits = 2
> cov_names  = "age"   # string with covariate names
> method_    = "BFGS"  # alternatively, if does not converge "Nedler-Mead"
> constraint = NULL    # additional model constraints
> fixedpars  = NULL    # fixed parameters
> # construct covariate list
> covariates <- as.formula(paste0("~",cov_names))
> # estimate model
> modelC <- msm(
+   formula       = state ~ age, 
+   subject       = id, 
+   data          = ds, 
+   center        = FALSE,
+   qmatrix       = Q, 
+   ematrix       = E,
+   death         = TRUE, 
+   covariates    = covariates,
+   censor        = c(-1,-2), 
+   censor.states = list(c(1,2,3), c(1,2,3)), 
+   method        = method_,
+   constraint    = constraint,
+   fixedpars     = fixedpars,
+   control       = list(trace=0,REPORT=1,maxit=1000,fnscale=10000)
+ )
> examine_multistate(model_C)
---------------------------------------
Model ---  with covariates: ~age
and constraints:
NULL
and fixedpars:
named integer(0)
Convergence code = 0 

-2loglik = 12974.6 
AIC = 13010.6 

Parameter estimats and SEs:
--------------------------------------- 
                  q      p    se Wald.ChiSq Pr.ChiSq
1    Healthy - Mild -1.725 0.053   1059.318    0.000
2   Health - Severe -5.803 0.542    114.632    0.000
3    Healthy - Dead -3.738 0.126    880.111    0.000
4    Mild - Healthy -0.251 0.059     18.099    0.000
5     Mild - Severe -1.258 0.091    191.108    0.000
6       Mild - Dead -2.854 0.290     96.853    0.000
7  Severe - Healthy -4.726 0.691     46.777    0.000
8     Severe - Mild -1.807 0.143    159.678    0.000
9     Severe - Dead -1.791 0.124    208.616    0.000
10   Healthy - Mild  0.081 0.007    133.898    0.000
11  Health - Severe -0.033 0.062      0.283    0.595
12   Healthy - Dead  0.083 0.015     30.618    0.000
13   Mild - Healthy -0.017 0.008      4.516    0.034
14    Mild - Severe  0.044 0.010     19.360    0.000
15      Mild - Dead  0.065 0.028      5.389    0.020
16 Severe - Healthy -0.166 0.072      5.316    0.021
17    Severe - Mild -0.002 0.016      0.016    0.901
18    Severe - Dead  0.061 0.011     30.752    0.000
Warning messages:
1: In wald[1:sum(names(p) == "qbase")] <- wald :
  number of items to replace is not a multiple of replacement length
2: In pvalue[1:sum(names(p) == "qbase")] <- pvalue :
  number of items to replace is not a multiple of replacement length

And once again, the misspecification of q31 in the E matrix cause the same error

> E <- rbind( c(0,  0,  0,  0),  
+             c(0,  0,  0,  0), 
+             c(.1,  0,  0,  0),
+             c(0,  0,  0,  0) )
> model_D <- msm(
+   formula       = state ~ age, 
+   subject       = id, 
+   data          = ds, 
+   center        = FALSE,
+   qmatrix       = Q, 
+   ematrix       = E,
+   death         = TRUE, 
+   covariates    = covariates,
+   censor        = c(-1,-2), 
+   censor.states = list(c(1,2,3), c(1,2,3)), 
+   method        = method_,
+   constraint    = constraint,
+   fixedpars     = fixedpars,
+   control       = list(trace=0,REPORT=1,maxit=1000,fnscale=10000)
+ )
Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1,  : 
  initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)

andkov commented 7 years ago

after chatting with @robsonucl

The general rule of using matrix E to define misspecification:

The q-element of the transition that needs to be defined as misspecified should be set to 0 in the Q matrix and to non-zero in the E matrix.

To demonstrate, given a Q matrix of the four state model

Q <- rbind( c(0,  q,  q,  q),
            c(q,  0,  q,  q),
            c(0,  0,  0,  q),
            c(0,  0,  0,  0))

and the observed state table of

|   | -2| -1|    1|   2|   3|   4|
|:--|--:|--:|----:|---:|---:|---:|
|-2 | 32|  0|    0|   0|   0|   0|
|-1 |  0| 25|   27|  13|  27|  50|
|1  | 32| 59| 4855| 715| 120| 251|
|2  |  8| 20|  534| 478| 257| 146|
|3  |  6| 34|   24|  97| 649| 233|

if we wish to identify the transition q32 as misspecified, the corresponding E matrix should look like this:

E <- rbind( c(0,  0,  0,  0),  
            c(0,  0,  0,  0), 
            c(0,  .1, 0,  0),
            c(0,  0,  0,  0) )

If error persists (as it is in my case)

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1,  : 
  initial value in 'vmmin' is not finite

one potential hypothesis suggested by @robsonucl was data integrity, specifically in regards to intermediate missing states (-1).

Investigation continues.

andkov commented 7 years ago

The logic of specifying the misclassification matrix E (described above) seems incomplete.

In the Example III of the ELECT vignette (p.13) gives an example where non-zero starting values in E are matched by non-zero values in Q (e.g. q12, q23).

State table: 
       to
  from 1    2   3   4
  1    1367 204 44  148
  2    46   134 54  48
  3    4    13  107 55

q <-0.01;
Q <- rbind( 
  c(0, q, 0, q), 
  c(0, 0, q, q),
  c(0, 0, 0, q),
  c(0, 0, 0, 0) )

ematrix <- rbind( 
  c( 0, .1,  0, 0), 
  c(.1,  0, .1, 0), 
  c( 0, .1,  0, 0),
  c( 0,  0,  0, 0) )

model <- msm(
  state~age, 
  subject=PTNUM, 
  data=cav, 
  center=FALSE,
  qmatrix=Q, 
  death=TRUE, 
  covariates=~age+dage+sex,
  ematrix=ematrix,
  fixedpars=c(6,8,9,10), 
  method="BFGS",
  control=list(trace=0,REPORT=1,maxit=1000,fnscale=10000)
)

Moreover, it is not clear why the least populous transition (q31, n=4) is not specified as misclassification, while a more populous one (q32, n=13) is. However, the use of

  fixedpars=c(6,8,9,10),

may have something to do with this. In any case, I can use some help with understanding the general logic of specifying misclassification matrix.

@robsonucl , @GracielaMuniz , @annierobi

robsonucl commented 7 years ago

Hi all,

I was talking to Annie about the specification of the ematrix. I am going to basically forward my last email to you. The Q matrix is correct and should be defined as

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

For the ematrix, let "O" denote the observed state and "S" denote the true state, then e23 = P(O = 3 / S = 2). In other words, e23 is the probability that we observe state 3 when it is actually state 2. Since we have misclassification from state 2 to state 3 only, the matrix should be defined as

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0.1,0), c(0, 0, 0, 0), c(0, 0, 0, 0)).

I am attaching a paper that you probably already have. I suggest you to read section 1.6.1 and and 2.14, so you can have a better description of the misclassification matrix.

Please, let me know if your code is running. I can try to run from my laptop and hopefully find a mistake\solution.

Best Robson

On 25 August 2016 at 19:20, Andriy V. Koval notifications@github.com wrote:

after chatting with @robsonucl https://github.com/robsonucl

The general rule of using matrix E to define misspecification:

The q-element of the transition that needs to be defined as misspecified should be set to 0 in the Q matrix and to non-zero in the E matrix.

To demonstrate, given a Q matrix of the four state model

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

and the observed state table of

-2 -1 1 2 3 4

-2 32 0 0 0 0 0

-1 0 25 27 13 27 50

1 32 59 4855 715 120 251

2 8 20 534 478 257 146

3 6 34 24 97 649 233

if we wish to identify the transition q32 as misspecified, the corresponding E matrix should look like this:

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0, 0), c(0, .1, 0, 0), c(0, 0, 0, 0) )

If error persists (as it is in my case)

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1, : initial value in 'vmmin' is not finite

one potential hypothesis suggested by @robsonucl https://github.com/robsonucl was data integrity, specifically in regards to intermediate missing states (-1).

Investigation continues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242490234, or mute the thread https://github.com/notifications/unsubscribe-auth/AUQVjZD2IHgf5fOe8oLiUM-fm9p74Cntks5qjd0KgaJpZM4JsUTa .

	-2	-1	1	2	3	4
-2	32	0	0	0	0	0
-1	0	25	27	13	27	50
1	32	59	4855	715	120	251
2	8	20	534	478	257	146
3	6	34	24	97	649	233

Robson

annierobi commented 7 years ago

Thanks Robson. All the models I have run do not include the 1 to 3 transition. My Q matrix is: Q <- rbind( c(0, q, 0, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

rather than:

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

What have others modeled? Can we actually model the 1 to 3 transition?

Thanks

Annie

On Sun, Aug 28, 2016 at 7:47 AM, robsonucl notifications@github.com wrote:

Hi all,

I was talking to Annie about the specification of the ematrix. I am going to basically forward my last email to you. The Q matrix is correct and should be defined as

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

For the ematrix, let "O" denote the observed state and "S" denote the true state, then e23 = P(O = 3 / S = 2). In other words, e23 is the probability that we observe state 3 when it is actually state 2. Since we have misclassification from state 2 to state 3 only, the matrix should be defined as

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0.1,0), c(0, 0, 0, 0), c(0, 0, 0, 0)).

I am attaching a paper that you probably already have. I suggest you to read section 1.6.1 and and 2.14, so you can have a better description of the misclassification matrix.

Please, let me know if your code is running. I can try to run from my laptop and hopefully find a mistake\solution.

Best Robson

On 25 August 2016 at 19:20, Andriy V. Koval notifications@github.com wrote:

after chatting with @robsonucl https://github.com/robsonucl

The general rule of using matrix E to define misspecification:

The q-element of the transition that needs to be defined as misspecified should be set to 0 in the Q matrix and to non-zero in the E matrix.

To demonstrate, given a Q matrix of the four state model

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

and the observed state table of

-2 -1 1 2 3 4

-2 32 0 0 0 0 0

-1 0 25 27 13 27 50

1 32 59 4855 715 120 251

2 8 20 534 478 257 146

3 6 34 24 97 649 233

if we wish to identify the transition q32 as misspecified, the corresponding E matrix should look like this:

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0, 0), c(0, .1, 0, 0), c(0, 0, 0, 0) )

If error persists (as it is in my case)

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1, : initial value in 'vmmin' is not finite

one potential hypothesis suggested by @robsonucl https://github.com/robsonucl was data integrity, specifically in regards to intermediate missing states (-1).

Investigation continues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242490234, or mute the thread https://github.com/notifications/unsubscribe- auth/AUQVjZD2IHgf5fOe8oLiUM-fm9p74Cntks5qjd0KgaJpZM4JsUTa .

Robson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242970483, or mute the thread https://github.com/notifications/unsubscribe-auth/AKow3dL40Dv2duviWL9HA0RsmlvwAk8zks5qkXU_gaJpZM4JsUTa .

	-2	-1	1	2	3	4
-2	32	0	0	0	0	0
-1	0	25	27	13	27	50
1	32	59	4855	715	120	251
2	8	20	534	478	257	146
3	6	34	24	97	649	233

robsonucl commented 7 years ago

Sorry Annie. I copied the matrix from the last email. The entry q13 = 0.

Robson

On Sunday, 28 August 2016, annierobi notifications@github.com wrote:

Thanks Robson. All the models I have run do not include the 1 to 3 transition. My Q matrix is: Q <- rbind( c(0, q, 0, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

rather than:

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

What have others modeled? Can we actually model the 1 to 3 transition?

Thanks

Annie

On Sun, Aug 28, 2016 at 7:47 AM, robsonucl <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Hi all,

I was talking to Annie about the specification of the ematrix. I am going to basically forward my last email to you. The Q matrix is correct and should be defined as

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

For the ematrix, let "O" denote the observed state and "S" denote the true state, then e23 = P(O = 3 / S = 2). In other words, e23 is the probability that we observe state 3 when it is actually state 2. Since we have misclassification from state 2 to state 3 only, the matrix should be defined as

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0.1,0), c(0, 0, 0, 0), c(0, 0, 0, 0)).

I am attaching a paper that you probably already have. I suggest you to read section 1.6.1 and and 2.14, so you can have a better description of the misclassification matrix.

Please, let me know if your code is running. I can try to run from my laptop and hopefully find a mistake\solution.

Best Robson

On 25 August 2016 at 19:20, Andriy V. Koval <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

after chatting with @robsonucl https://github.com/robsonucl

The general rule of using matrix E to define misspecification:

The q-element of the transition that needs to be defined as misspecified should be set to 0 in the Q matrix and to non-zero in the E matrix.

To demonstrate, given a Q matrix of the four state model

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

and the observed state table of

-2 -1 1 2 3 4

-2 32 0 0 0 0 0

-1 0 25 27 13 27 50

1 32 59 4855 715 120 251

2 8 20 534 478 257 146

3 6 34 24 97 649 233

if we wish to identify the transition q32 as misspecified, the corresponding E matrix should look like this:

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0, 0), c(0, .1, 0, 0), c(0, 0, 0, 0) )

If error persists (as it is in my case)

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1, : initial value in 'vmmin' is not finite

one potential hypothesis suggested by @robsonucl https://github.com/robsonucl was data integrity, specifically in regards to intermediate missing states (-1).

Investigation continues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242490234, or mute the thread https://github.com/notifications/unsubscribe- auth/AUQVjZD2IHgf5fOe8oLiUM-fm9p74Cntks5qjd0KgaJpZM4JsUTa .

Robson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242970483, or mute the thread https://github.com/notifications/unsubscribe-auth/ AKow3dL40Dv2duviWL9HA0RsmlvwAk8zks5qkXU_gaJpZM4JsUTa .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242979773, or mute the thread https://github.com/notifications/unsubscribe-auth/AUQVjYMnEKU61_uMu-kJCp9DohbMhyJXks5qkaRRgaJpZM4JsUTa .

	-2	-1	1	2	3	4
-2	32	0	0	0	0	0
-1	0	25	27	13	27	50
1	32	59	4855	715	120	251
2	8	20	534	478	257	146
3	6	34	24	97	649	233

Robson

annierobi commented 7 years ago

Great. Thanks.

Annie

On Sun, Aug 28, 2016 at 11:14 AM, robsonucl notifications@github.com wrote:

Sorry Annie. I copied the matrix from the last email. The entry q13 = 0.

Robson

On Sunday, 28 August 2016, annierobi notifications@github.com wrote:

Thanks Robson. All the models I have run do not include the 1 to 3 transition. My Q matrix is: Q <- rbind( c(0, q, 0, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

rather than:

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

What have others modeled? Can we actually model the 1 to 3 transition?

Thanks

Annie

On Sun, Aug 28, 2016 at 7:47 AM, robsonucl <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Hi all,

I was talking to Annie about the specification of the ematrix. I am going to basically forward my last email to you. The Q matrix is correct and should be defined as

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

For the ematrix, let "O" denote the observed state and "S" denote the true state, then e23 = P(O = 3 / S = 2). In other words, e23 is the probability that we observe state 3 when it is actually state 2. Since we have misclassification from state 2 to state 3 only, the matrix should be defined as

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0.1,0), c(0, 0, 0, 0), c(0, 0, 0, 0)).

I am attaching a paper that you probably already have. I suggest you to read section 1.6.1 and and 2.14, so you can have a better description of the misclassification matrix.

Please, let me know if your code is running. I can try to run from my laptop and hopefully find a mistake\solution.

Best Robson

On 25 August 2016 at 19:20, Andriy V. Koval <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

after chatting with @robsonucl https://github.com/robsonucl

The general rule of using matrix E to define misspecification:

The q-element of the transition that needs to be defined as misspecified should be set to 0 in the Q matrix and to non-zero in the E matrix.

To demonstrate, given a Q matrix of the four state model

Q <- rbind( c(0, q, q, q), c(q, 0, q, q), c(0, 0, 0, q), c(0, 0, 0, 0))

and the observed state table of

-2 -1 1 2 3 4

-2 32 0 0 0 0 0

-1 0 25 27 13 27 50

1 32 59 4855 715 120 251

2 8 20 534 478 257 146

3 6 34 24 97 649 233

if we wish to identify the transition q32 as misspecified, the corresponding E matrix should look like this:

E <- rbind( c(0, 0, 0, 0), c(0, 0, 0, 0), c(0, .1, 0, 0), c(0, 0, 0, 0) )

If error persists (as it is in my case)

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1, : initial value in 'vmmin' is not finite

one potential hypothesis suggested by @robsonucl https://github.com/robsonucl was data integrity, specifically in regards to intermediate missing states (-1).

Investigation continues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242490234, or mute the thread https://github.com/notifications/unsubscribe- auth/AUQVjZD2IHgf5fOe8oLiUM-fm9p74Cntks5qjd0KgaJpZM4JsUTa .

Robson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242970483, or mute the thread https://github.com/notifications/unsubscribe-auth/ AKow3dL40Dv2duviWL9HA0RsmlvwAk8zks5qkXU_gaJpZM4JsUTa .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242979773, or mute the thread https://github.com/notifications/unsubscribe-auth/AUQVjYMnEKU61_uMu- kJCp9DohbMhyJXks5qkaRRgaJpZM4JsUTa

.

Robson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242980102, or mute the thread https://github.com/notifications/unsubscribe-auth/AKow3VR15IDFbzYkUpS6zVnE9gH9Zchsks5qkaXWgaJpZM4JsUTa .

	-2	-1	1	2	3	4
-2	32	0	0	0	0	0
-1	0	25	27	13	27	50
1	32	59	4855	715	120	251
2	8	20	534	478	257	146
3	6	34	24	97	649	233

andkov commented 7 years ago

@robsonucl, could you please post the citation to the paper you mention?

robsonucl commented 7 years ago

Sure, here it is

Jackson, Christopher. "Multi-state modelling with R: the msm package." (2016)

Robson

On 28 August 2016 at 17:45, Andriy V. Koval notifications@github.com wrote:

@robsonucl https://github.com/robsonucl, could you please post the citation to the paper you mention?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242984923, or mute the thread https://github.com/notifications/unsubscribe-auth/AUQVjR8YH_ej7n_muf44mLWU-UD86PEWks5qkbspgaJpZM4JsUTa .

Robson

annierobi commented 7 years ago

Has everyone centered age and year of birth?

I had initially transformed age to prevent numerical problems with estimation using without centering age and year of birth.

data$age <- data$age/75

I then re-ran the models with age and year of birth centered but am unsure what everyone else is doing.

data$age <- data$age - 80 data$ybirth <- data$ybirth - 1900

Thank you

Annie

On Sun, Aug 28, 2016 at 12:55 PM, robsonucl notifications@github.com wrote:

Sure, here it is

Jackson, Christopher. "Multi-state modelling with R: the msm package." (2016)

Robson

On 28 August 2016 at 17:45, Andriy V. Koval notifications@github.com wrote:

@robsonucl https://github.com/robsonucl, could you please post the citation to the paper you mention?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/ 29#issuecomment-242984923, or mute the thread https://github.com/notifications/unsubscribe- auth/AUQVjR8YH_ej7n_muf44mLWU-UD86PEWks5qkbspgaJpZM4JsUTa .

Robson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-242985466, or mute the thread https://github.com/notifications/unsubscribe-auth/AKow3XQDsBAg4GNH51YXpnMaKHquN_Y_ks5qkb18gaJpZM4JsUTa .

andkov commented 7 years ago

@annierobi, I have centered both age at visit and cohort (I am using age at baseline for cohort effect) at 80.

annierobi commented 7 years ago

In addition to age centered at age 80, did you also transform like this:

data$age <- data$age/75

My models run nicely with all covariates included if I only transform (data$age <- data$age/75). When I try also centering age I run into problems.

On Wed, Aug 31, 2016 at 2:59 PM, Andriy V. Koval notifications@github.com wrote:

@annierobi https://github.com/annierobi, I have centered both age at visit and cohort (I am using age at baseline for cohort effect) at 80.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IALSA/ialsa-2016-amsterdam/issues/29#issuecomment-243865850, or mute the thread https://github.com/notifications/unsubscribe-auth/AKow3XzgHdf7HrdoRSBdW4GgWXYu7mmyks5qlc8igaJpZM4JsUTa .

andkov commented 7 years ago

No, i've seen Ardo doing data$age/20 in his example(see line 14), which seems to be serving the same function : avoid numerical issues during the estimation.

andkov commented 7 years ago

@annierobi , @robsonucl

I think I have figured out the misclassification error. Jackson (2016, p.38) says

The true state for every patient at the date of transplant is known to be “CAV-free”, not misclassified. To indicate this we use the argument obstrue to msm. This is set to be a variable in the dataset, firstobs, indicating where the observed state equals the true state. This takes the value of 1 at the patient’s first observation, at the transplant date, and 0 elsewhere.

The variable firstobs can be computed in the following way:

# Add first observation indicator
# this creates a new dummy variable "firstobs" with 1 for the frist wave
cat("\nFirst observation indicator is added.\n")
offset <- rep(NA,N)
for(i in 1:N){offset[i] <- min(which(dta$id==subjects[i]))}
firstobs <- rep(0,nrow(dta))
firstobs[offset] <- 1
dta <- cbind(dta,firstobs=firstobs)
head(dta)

This seems to remove the error

Error in optim(method = "BFGS", control = list(trace = 0, REPORT = 1,  : 
  initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)