ebenmichael / augsynth

Augmented Synthetic Control Method
MIT License
147 stars 52 forks source link

"caught segfault" #53

Closed williamlief closed 3 years ago

williamlief commented 3 years ago

I'm having issues with the R Session aborting when I try to run augsynth with my data. This is start of the console message that I see when running in base R:

One outcome and one treatment time found. Running single_augsynth.
Error in KKT matrix LDL factorization when computing the nonzero elements. The problem seems to be non-convexERROR in osqp_setup: KKT matrix factorization.
The problem seems to be non-convex.

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: osqpSolve(private$.work)
...

This has been tested on Mac OS 10.15.7 and on Windows 10. I can run the example augsynth function with the included Kansas data without issue. I have included a reprex that references our data here. Thank you.

library(dplyr)
library(augsynth)

# Get our data
url <- "https://github.com/williamlief/synth_vax/blob/main/data/daily_data_2021-07-04.rds?raw=true"
dat <- readRDS(url(url, method="libcurl"))

dat <- dat %>% 
  select(people_fully_vaccinated_per_hundred, 
         state, centered_time) %>% 
  mutate(centered_time = as.numeric(centered_time), 
         treat = as.numeric(state == "OH" & centered_time >= 0),
         state = as.numeric(factor(state)))

# R aborts here with segfault:
syn <- augsynth(people_fully_vaccinated_per_hundred ~ treat, 
         unit = state, time = centered_time, data = dat)
davidnathanlang commented 3 years ago

`> sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] lubridate_1.7.10 tidylog_1.0.2 augsynth_0.2.0 stargazer_5.2.2 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[8] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.5 tidyverse_1.3.1 here_1.0.1

loaded via a namespace (and not attached): [1] httr_1.4.2 pkgload_1.2.1 jsonlite_1.7.2 modelr_0.1.8 Formula_1.2-4 assertthat_0.2.1
[7] cellranger_1.1.0 remotes_2.4.0 sessioninfo_1.1.1 numDeriv_2016.8-1.1 pillar_1.6.1 backports_1.2.1
[13] lattice_0.20-44 glue_1.4.2 digest_0.6.27 osqp_0.6.0.3 rvest_1.0.0 colorspace_2.0-2
[19] sandwich_3.0-1 Matrix_1.3-3 clisymbols_1.2.0 pkgconfig_2.0.3 devtools_2.4.2 broom_0.7.8
[25] haven_2.4.1 scales_1.1.1 processx_3.5.2 farver_2.1.0 generics_0.1.0 usethis_2.0.1
[31] ellipsis_0.3.2 cachem_1.0.5 pacman_0.5.1 withr_2.4.2 cli_3.0.1 magrittr_2.0.1
[37] crayon_1.4.1 readxl_1.3.1 memoise_2.0.0 ps_1.6.0 fs_1.5.0 fansi_0.5.0
[43] nlme_3.1-152 xml2_1.3.2 pkgbuild_1.2.0 dreamerr_1.2.3 fixest_0.9.0 tools_4.1.0
[49] prettyunits_1.1.1 hms_1.1.0 lifecycle_1.0.0 munsell_0.5.0 reprex_2.0.0 callr_3.7.0
[55] compiler_4.1.0 rlang_0.4.11 grid_4.1.0 rstudioapi_0.13 labeling_0.4.2 testthat_3.0.4
[61] gtable_0.3.0 LowRankQP_1.0.4 DBI_1.1.1 R6_2.5.0 zoo_1.8-9 fastmap_1.1.0
[67] utf8_1.2.1 rprojroot_2.0.2 desc_1.3.0 stringi_1.6.2 Rcpp_1.0.7 vctrs_0.3.8
[73] dbplyr_2.1.1 tidyselect_1.1.1 `

davidnathanlang commented 3 years ago

It appears to be a function of a degenerate KKT matrix and/or missing values.

ebenmichael commented 3 years ago

@davidnathanlang is right. This is coming from the underlying optimizer because there's missing data.

This dataset is missing outcomes in these centered times: -120 -119 -118 -116 -115 -114 -86 19.

If you drop that then everything will run. I'll work on trying to catch this and give a better error message.