Open Lachlan-Cribb opened 2 days ago
HI, I found in the beta testing that with Hmisc and the rcspline function you might need to use Hmisc::rcspline.eval instead. I am not sure why. Roger
From: Lachlan Cribb @.> Sent: Monday, October 14, 2024 4:15 AM To: CausalInference/gfoRmula @.> Cc: Subscribed @.***> Subject: [CausalInference/gfoRmula] Restricted cubic splines (Issue #39)
Thanks for the great package!
The paper in Patterns and the vignettes describe modelling variables with restricted cubic splines, using the rcspline.eval() function from Hmisc. I've noticed that rcspline.eval() isn't returning what I would expect. For example, if you fit a model including a spline with three knots, only a single parameter is estimated, rather than the expected 2.
Here is an example putting a spline on L2 in the model for A:
library(gfoRmula) library(Hmisc)
id <- "id" time_points <- 7 time_name <- "t0" covnames <- c("L1", "L2", "A") outcome_name <- "Y" outcome_type <- "survival" covtypes <- c("binary", "bounded normal", "binary") histories <- c(lagged, lagavg) histvars <- list(c("A", "L1", "L2"), c("L1", "L2")) covparams <- list(covmodels = c( L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0, L2 ~ lag1_A + L1 + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0, A ~ lag1_A + L1 + rcspline.eval(L2, knots = c(-1,0,1)) + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0 )) ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0 intervention1.A <- list(static, rep(0, time_points)) intervention2.A <- list(static, rep(1, time_points)) int_descript <- c("Never treat", "Always treat") nsimul <- 10000
gform_basic <- gformula( obs_data = basicdata_nocomp, id = id, time_points = time_points, time_name = time_name, covnames = covnames, outcome_name = outcome_name, outcome_type = outcome_type, covtypes = covtypes, covparams = covparams, ymodel = ymodel, intervention1.A = intervention1.A, intervention2.A = intervention2.A, int_descript = int_descript, histories = histories, histvars = histvars, basecovs = c("L3"), nsimul = nsimul, seed = 1234, model_fits = TRUE )
This executes without problem, but looking at the model fit, there is only a single parameter for L2.
summary(gform_basic$fits$A)
Call: stats::glm(formula = stats::as.formula(paste(covmodels[j])), family = eval(parse(text = famtext)), data = obs_data, y = TRUE)
On the other hand, using the rcs() function from the rms package seems to work, giving the expected two parameters:
covparams <- list(covmodels = c( L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0, L2 ~ lag1_A + L1 + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0, A ~ lag1_A + L1 + rcs(L2, c(-1,0,1)) + lag_cumavg1_L1 + lag_cumavg1_L2 + L3 + t0 )) gform_basic <- gformula( obs_data = basicdata_nocomp, id = id, time_points = time_points, time_name = time_name, covnames = covnames, outcome_name = outcome_name, outcome_type = outcome_type, covtypes = covtypes, covparams = covparams, ymodel = ymodel, intervention1.A = intervention1.A, intervention2.A = intervention2.A, int_descript = int_descript, histories = histories, histvars = histvars, basecovs = c("L3"), nsimul = nsimul, seed = 1234, model_fits = TRUE )
summary(gform_basic$fits$A)
Call: stats::glm(formula = stats::as.formula(paste(covmodels[j])), family = eval(parse(text = famtext)), data = obs_data, y = TRUE)
Session info:
sessionInfo() R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Pop!_OS 22.04 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 [4] LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
time zone: Australia/Melbourne tzcode source: system (glibc)
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] rms_6.8-1 Hmisc_5.1-3 gfoRmula_1.1.0
loaded via a namespace (and not attached): [1] gtable_0.3.5 xfun_0.44 ggplot2_3.5.1 htmlwidgets_1.6.4 [5] lattice_0.22-6 vctrs_0.6.5 tools_4.4.1 generics_0.1.3 [9] sandwich_3.1-0 tibble_3.2.1 fansi_1.0.6 highr_0.11 [13] cluster_2.1.6 pkgconfig_2.0.3 R.oo_1.26.0 Matrix_1.7-0 [17] data.table_1.15.4 checkmate_2.3.1 lifecycle_1.0.4 R.cache_0.16.0 [21] compiler_4.4.1 stringr_1.5.1 MatrixModels_0.5-3 munsell_0.5.1 [25] codetools_0.2-20 SparseM_1.83 quantreg_5.98 htmltools_0.5.8.1 [29] htmlTable_2.4.2 Formula_1.2-5 pillar_1.9.0 MASS_7.3-61 [33] R.utils_2.12.3 rpart_4.1.23 multcomp_1.4-25 nlme_3.1-165 [37] styler_1.10.3 tidyselect_1.2.1 digest_0.6.35 mvtnorm_1.2-5 [41] polspline_1.1.25 stringi_1.8.4 dplyr_1.1.4 purrr_1.0.2 [45] splines_4.4.1 fastmap_1.2.0 grid_4.4.1 colorspace_2.1-0 [49] cli_3.6.2 magrittr_2.0.3 base64enc_0.1-3 survival_3.7-0 [53] utf8_1.2.4 TH.data_1.1-2 foreign_0.8-86 withr_3.0.0 [57] scales_1.3.0 backports_1.5.0 rmarkdown_2.27 nnet_7.3-19 [61] gridExtra_2.3 zoo_1.8-12 R.methodsS3_1.8.2 evaluate_0.24.0 [65] knitr_1.47 rlang_1.1.4 glue_1.7.0 renv_1.0.7 [69] rstudioapi_0.16.0 R6_2.5.1
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_CausalInference_gfoRmula_issues_39&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=LdWqYgjY9ckccBYvpiAnjE30O4qK4wn7lb-i85xLduo&m=AVxWJXzXPm6uIXaqYcYJfKm3-qkMWrI2sDPXq-aqOS0KQWspppqW8-Edt2IYvIKQ&s=DcQTQxiX47K_pO8NGAtkaXpTIZ-w2kuGJBsZmW0pBGU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEZDUJI3FTZTRL2LJHHY2LTZ3N4TNAVCNFSM6AAAAABP4NZ2W2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DKMJYGM2DANI&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=LdWqYgjY9ckccBYvpiAnjE30O4qK4wn7lb-i85xLduo&m=AVxWJXzXPm6uIXaqYcYJfKm3-qkMWrI2sDPXq-aqOS0KQWspppqW8-Edt2IYvIKQ&s=JWCZBaV2e0wfj_NurnIYbWDYBVTt9G9DJ6VCQBFNFtA&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Roger,
I tried with Hmisc::rcspline.eval and it's the same issue, only a single parameter is estimated. This seems to be an issue with Hmisc. Executing the following line returns a vector rather than a matrix as its documentation suggests:
> str(Hmisc::rcspline.eval(basicdata_nocomp$L2, knots = c(-1,-0.5,0,1)))
num [1:13170, 1:2] 1.72 1.04e-04 2.51e-07 1.51 0.00 ...
- attr(*, "knots")= num [1:4] -1 -0.5 0 1
If it is a Hmisc issue, might be something to be aware of because people may be using rcspline.eval() inside gformula without errors/warnings and not getting the results they should.
Hi, Can you run the non-bootstrap R code in the benchmark package? It is in the gfoRmula-benchmark repository, https://github.com/CausalInference/gfoRmula-benchmark .
I ran this code with the latest version of R for windows in Rstudio and got the correct output for the splines. (Added the model_fits=TRUE option. )
Roger
summary(gform_test$fits$lncd4_v)
Call: stats::glm(formula = stats::as.formula(paste(covmodels[j])), family = eval(parse(text = famtext)), data = obs_data, y = TRUE)
Coefficients:
Hmisc::rcspline.eval(lag1_lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))1 1.339e-01 1.106e-02 12.112 < 2e-16 Hmisc::rcspline.eval(lag1_lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))2 -3.125e-01 3.933e-02 -7.946 1.94e-15 Hmisc::rcspline.eval(lag1_lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))3 2.961e-01 7.394e-02 4.004 6.22e-05 ***
Hmisc::rcspline.eval(lag1_lncd4_v, knots = c(3.912, 4.605, 5.298, 5.858, 6.215))1 5.480e-02 3.964e-02 1.382 0.166900 Hmisc::rcspline.eval(lag1_lncd4_v, knots = c(3.912, 4.605, 5.298, 5.858, 6.215))2 -2.539e-02 1.282e-01 -0.198 0.843059 Hmisc::rcspline.eval(lag1_lncd4_v, knots = c(3.912, 4.605, 5.298, 5.858, 6.215))3 -1.824e-01 1.924e-01 -0.948 0.342961 -5.849e-03 7.945e-04 -7.362 1.82e-13 Hmisc::rcspline.eval(lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))1 -3.236e-01 1.138e-02 -28.442 < 2e-16 Hmisc::rcspline.eval(lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))2 8.323e-01 4.061e-02 20.497 < 2e-16 Hmisc::rcspline.eval(lnrna_v, knots = c(6.215, 7.601, 9.211, 10.597, 11.513))3 -9.240e-01 7.737e-02 -11.943 < 2e-16
From: Lachlan Cribb @.> Sent: Monday, October 14, 2024 4:58 AM To: CausalInference/gfoRmula @.> Cc: Logan, Roger @.>; Comment @.> Subject: Re: [CausalInference/gfoRmula] Restricted cubic splines (Issue #39)
Hi Roger,
I tried with Hmisc::rcspline.eval and it's the same issue, only a single parameter is estimated. This seems to be an issue with Hmisc. Executing the following line returns a vector rather than a matrix as its documentation suggests:
str(Hmisc::rcspline.eval(basicdata_nocomp$L2, knots = c(-1,-0.5,0,1))) num [1:13170, 1:2] 1.72 1.04e-04 2.51e-07 1.51 0.00 ...
- attr(*, "knots")= num [1:4] -1 -0.5 0 1
If it is a Hmisc issue, might be something to be aware of because people may be using rcspline.eval() inside gformula without errors/warnings and not getting the results they should.
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_CausalInference_gfoRmula_issues_39-23issuecomment-2D2410506061&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=LdWqYgjY9ckccBYvpiAnjE30O4qK4wn7lb-i85xLduo&m=HkcQyxfpz2XC8IxCCkGM1s7AXRoXEJa1htyC-fub4gMm6yL9OUCEVNMUnqR9cNqy&s=RjnSqQJHddYOSA_WQ-XvoSHblUB65sXia7bM0PunVYg&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEZDUJJDMMICH6C5OPTWJALZ3OBUJAVCNFSM6AAAAABP4NZ2W2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJQGUYDMMBWGE&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=LdWqYgjY9ckccBYvpiAnjE30O4qK4wn7lb-i85xLduo&m=HkcQyxfpz2XC8IxCCkGM1s7AXRoXEJa1htyC-fub4gMm6yL9OUCEVNMUnqR9cNqy&s=W1nSOrv0kDMRGmah5V_Yn6T5Mkk2ppDcgtILS3dIyYk&e=. You are receiving this because you commented.
I have been confused. The rcspline.eval function works fine. I use the rcs() function from the rms package routinely, and there are always k-1 parameters estimated, where k is the number of knots. With rcspline.eval(), there are only k-2 parameters. So I thought there must be a problem.
The difference is because when rcs interanlly calls rcspline.eval, it has the inclx argument set to TRUE (by default it is false). So a matrix with k-1 columns is created. I don't know enough about restricted cubic splines to know what role the inclx argument plays.
Thanks for the great package!
The paper in Patterns and the vignettes describe modelling variables with restricted cubic splines, using the rcspline.eval() function from Hmisc. I've noticed that rcspline.eval() isn't returning what I would expect. For example, if you fit a model including a spline with three knots, only a single parameter is estimated, rather than the expected 2.
Here is an example putting a spline on L2 in the model for A:
This executes without problem, but looking at the model fit, there is only a single parameter for L2.
On the other hand, using the rcs() function from the rms package seems to work, giving the expected two parameters:
Session info: