bnowok / synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control
39 stars 7 forks source link

RStudio Freezing When Dataframe Includes Factor #18

Open benjaminwnelson opened 3 years ago

benjaminwnelson commented 3 years ago

Synthpop is working with numeric data, but anytime I include a factor with more than 2 levels it causes the program to freeze. Any thoughts on why this might be the case? Thanks!

Sinan-Yavuz commented 3 years ago

Synthpop is working with numeric data, but anytime I include a factor with more than 2 levels it causes the program to freeze. Any thoughts on why this might be the case? Thanks!

I have the same problem, RStudio crashes.

gillian-raab commented 3 years ago

Seems very odd. Which version of synthpop are you using CRAN or github?

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: Sinan Yavuz @.> Sent: 19 July 2021 19:30 To: bnowok/synthpop @.> Cc: Subscribed @.***> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Synthpop is working with numeric data, but anytime I include a factor with more than 2 levels it causes the program to freeze. Any thoughts on why this might be the case? Thanks!

I have the same problem, RStudio crashes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-882766487, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7BZVPBGLU3CMNTI7TLTYRVMNANCNFSM5ARGUS4Q.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Sinan-Yavuz commented 3 years ago

I am using CRAN version, 1.6.0

benjaminwnelson commented 3 years ago

I am using R 4.1.0.

gillian-raab commented 3 years ago

and synthpop from CRAN?

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: Benjamin Nelson @.> Sent: 22 July 2021 16:56 To: bnowok/synthpop @.> Cc: RAAB Gillian @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

I am using R 4.1.0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-885025444, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7HFPJHNASK6IBHATF3TZA5SNANCNFSM5ARGUS4Q.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

gillian-raab commented 3 years ago

I'm using R version 4.5 either on its own or in R studio and this code that has 3 non-binary factors runs fine. Can you see if that works for you and/or send us the code that failed.

library(synthpop)

help(synthpop) # version 1.6-0

ods <- SD2011[,c(1,4:6)] tt <- syn(ods) compare(tt,ods)

Best Gillian

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: Benjamin Nelson @.> Sent: 22 July 2021 16:56 To: bnowok/synthpop @.> Cc: RAAB Gillian @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

I am using R 4.1.0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-885025444, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7HFPJHNASK6IBHATF3TZA5SNANCNFSM5ARGUS4Q.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

benjaminwnelson commented 3 years ago

I'm using synthpop 1.6-0.

When I run your code it works perfectly. I tried it on my dataset again with 17 variables and 2,000 observations and it can't get past the gender variable. I tried only loading synthpop and no other packages and the same thing happened.

gillian-raab commented 3 years ago

Very odd. Can you send me the data set that caused problems, or if you can't for confidentiality reasons the error message you received and some details of the variables, so I can suggest other ways. G

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: Benjamin Nelson @.> Sent: 23 July 2021 19:34 To: bnowok/synthpop @.> Cc: RAAB Gillian @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

I'm using synthpop 1.6-0.

When I run your code it works perfectly. I tried it on my dataset again with 17 variables and 2,000 observations and it can't get past the gender variable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-885825914, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7HGRWTZ5LLIMO2D7J3TZGY4HANCNFSM5ARGUS4Q.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

wbuchanan commented 1 year ago

Using Version 1.8-0 R 4.2.3

set.seed(7779311)
library(haven)
library(dplyr)
filenm <- "https://github.com/OpenSDP/faketucky/raw/master/faketucky.dta"
df <- haven::read_dta(filenm, 
      col_select = c("sid", "first_dist_code", "first_hs_code", 
                     "first_hs_alt", "first_hs_urbanicity", "chrt_ninth", 
                     "male", "race_ethnicity", "frpl_ever_in_hs", 
                     "sped_ever_in_hs", "lep_ever_in_hs", "gifted_ever_in_hs",
                     "ever_alt_sch_in_hs", "scale_score_6_math", 
                     "scale_score_6_read", "scale_score_8_math", 
                     "scale_score_8_read", "pct_absent_in_hs", 
                     "pct_excused_in_hs", "avg_gpa_hs", "scale_score_11_eng", 
                     "scale_score_11_math", "scale_score_11_read",
                     "scale_score_11_comp", "collegeready_ever_in_hs", 
                     "careerready_ever_in_hs", "ap_ever_take_class", 
                     "last_acadyr_observed", "transferout", "dropout", 
                     "still_enrolled", "ontime_grad", "chrt_grad", "hs_diploma",
                     "enroll_yr1_any", "enroll_yr1_2yr", "enroll_yr1_4yr",
                     "enroll_yr2_any"))
names(df) <- c("stdid", "distid", "schcd", "altsch", "urbanicity", 
               "cohort", "male", "race", "frleverhs", "swdeverhs", "eleverhs",
               "tageverhs", "alteverhs", "mthss6", "rlass6", "mthss8", 
               "rlass8", "pctabshs", "pctexcusedhs", "hsgpa", "acteng11", 
               "actmth11", "actrla11", "actcmp11", "evercollrdyhs", 
               "evercarrdyhs", "aptakenever", "lastobsyr", "transfer", 
               "dropout", "stillenrolled", "gradontime", "gradcohort", 
               "diploma", "yr1psenrany", "yr1psenr2yr", "yr1psenr4yr", 
               "yr2psenrany")
df$schid <- paste0(df$distid, df$schcd)
validSchools <- data.frame("schid" = sample(unique(df$schid), size = 60))
df <- dplyr::inner_join(df, validSchools)
df$altsch <- as.factor(df$altsch)
df$cohort <- as.factor(df$cohort)
df$male <- as.factor(df$male)
df$swdeverhs <- as.factor(df$swdeverhs)
df$eleverhs <- as.factor(df$eleverhs)
df$schid <- as.factor(df$schid)
df$tageverhs <- as.factor(df$tageverhs)
df$alteverhs <- as.factor(df$alteverhs)
df$evercollrdyhs <- as.factor(df$evercollrdyhs)
df$evercarrdyhs <- as.factor(df$evercarrdyhs)
df$aptakenever <- as.factor(df$aptakenever)
df$transfer <- as.factor(df$transfer)
df$dropout <- as.factor(df$dropout)
df$stillenrolled <- as.factor(df$stillenrolled)
df$gradontime <- as.factor(df$gradontime)
df$diploma <- as.factor(df$diploma)
df$yr1psenrany <- as.factor(df$yr1psenrany)
df$yr1psenr2yr <- as.factor(df$yr1psenr2yr)
df$yr1psenr4yr <- as.factor(df$yr1psenr4yr)
df$yr2psenrany <- as.factor(df$yr2psenrany)
df$schid <- as.factor(df$schid)
df$race <- as.factor(df$race)
df$urbanicity <- as.factor(df$urbanicity)
df$frleverhs <- as.factor(df$frleverhs)
df$lastobsyr <- as.factor(df$lastobsyr)
df$gradcohort <- as.factor(df$gradcohort)
df <- df[-c(2, 3)]
library(synthpop)
# This works fine and executes relatively quickly
syn <- synthpop::syn(df)
# This freezes and fails to execute every time:
syn2 <- synthpop::syn(df[-c(1)], models = TRUE, 
                    visit.sequence = c("schid", "altsch", "male", "race", "cohort", "urbanicity", 
         "frleverhs", "swdeverhs", "eleverhs", "tageverhs", "alteverhs", 
         "mthss6", "rlass6", "mthss8", "rlass8", "pctabshs", "pctexcusedhs", 
         "aptakenever", "lastobsyr", "transfer", "dropout", "stillenrolled", 
         "hsgpa", "gradontime", "gradcohort", "diploma", "evercollrdyhs", 
         "evercarrdyhs", "actmth11", "actrla11", "acteng11", "actcmp11", 
         "yr1psenr2yr", "yr1psenr4yr", "yr1psenrany", "yr2psenrany"))

The second call to synthpop should sample school identifiers and then start modeling student level attributes. It fails consistently. It is only using a single core, even though the machine has 12 available and doesn't use all of the RAM available.

gillian-raab commented 1 year ago

DearWilliam, I have doiagnosed your problem and can offer a couple of solutions. Before I supply you with detailsa can you please let me know if you get this. Then I can send you code and/or, we could have a chat about how you are running your synthesis. I can offer someuggestions on how you might improve it.

Best wishes

Gillian

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days 07748 678 551


From: William Buchanan @.> Sent: 10 April 2023 20:52 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Using Version 1.8-0 R 4.2.3

set.seed(7779311) library(haven) library(dplyr) filenm <- "https://github.com/OpenSDP/faketucky/raw/master/faketucky.dta" df <- haven::read_dta(filenm, col_select = c("sid", "first_dist_code", "first_hs_code", "first_hs_alt", "first_hs_urbanicity", "chrt_ninth", "male", "race_ethnicity", "frpl_ever_in_hs", "sped_ever_in_hs", "lep_ever_in_hs", "gifted_ever_in_hs", "ever_alt_sch_in_hs", "scale_score_6_math", "scale_score_6_read", "scale_score_8_math", "scale_score_8_read", "pct_absent_in_hs", "pct_excused_in_hs", "avg_gpa_hs", "scale_score_11_eng", "scale_score_11_math", "scale_score_11_read", "scale_score_11_comp", "collegeready_ever_in_hs", "careerready_ever_in_hs", "ap_ever_take_class", "last_acadyr_observed", "transferout", "dropout", "still_enrolled", "ontime_grad", "chrt_grad", "hs_diploma", "enroll_yr1_any", "enroll_yr1_2yr", "enroll_yr1_4yr", "enroll_yr2_any")) names(df) <- c("stdid", "distid", "schcd", "altsch", "urbanicity", "cohort", "male", "race", "frleverhs", "swdeverhs", "eleverhs", "tageverhs", "alteverhs", "mthss6", "rlass6", "mthss8", "rlass8", "pctabshs", "pctexcusedhs", "hsgpa", "acteng11", "actmth11", "actrla11", "actcmp11", "evercollrdyhs", "evercarrdyhs", "aptakenever", "lastobsyr", "transfer", "dropout", "stillenrolled", "gradontime", "gradcohort", "diploma", "yr1psenrany", "yr1psenr2yr", "yr1psenr4yr", "yr2psenrany") df$schid <- paste0(df$distid, df$schcd) validSchools <- data.frame("schid" = sample(unique(df$schid), size = 60)) df <- dplyr::inner_join(df, validSchools) df$altsch <- as.factor(df$altsch) df$cohort <- as.factor(df$cohort) df$male <- as.factor(df$male) df$swdeverhs <- as.factor(df$swdeverhs) df$eleverhs <- as.factor(df$eleverhs) df$schid <- as.factor(df$schid) df$tageverhs <- as.factor(df$tageverhs) df$alteverhs <- as.factor(df$alteverhs) df$evercollrdyhs <- as.factor(df$evercollrdyhs) df$evercarrdyhs <- as.factor(df$evercarrdyhs) df$aptakenever <- as.factor(df$aptakenever) df$transfer <- as.factor(df$transfer) df$dropout <- as.factor(df$dropout) df$stillenrolled <- as.factor(df$stillenrolled) df$gradontime <- as.factor(df$gradontime) df$diploma <- as.factor(df$diploma) df$yr1psenrany <- as.factor(df$yr1psenrany) df$yr1psenr2yr <- as.factor(df$yr1psenr2yr) df$yr1psenr4yr <- as.factor(df$yr1psenr4yr) df$yr2psenrany <- as.factor(df$yr2psenrany) df$schid <- as.factor(df$schid) df$race <- as.factor(df$race) df$urbanicity <- as.factor(df$urbanicity) df$frleverhs <- as.factor(df$frleverhs) df$lastobsyr <- as.factor(df$lastobsyr) df$gradcohort <- as.factor(df$gradcohort) df <- df[-c(2, 3)] library(synthpop)

This works fine and executes relatively quickly

syn <- synthpop::syn(df)

This freezes and fails to execute every time:

syn2 <- synthpop::syn(df[-c(1)], models = TRUE, visit.sequence = c("schid", "altsch", "male", "race", "cohort", "urbanicity", "frleverhs", "swdeverhs", "eleverhs", "tageverhs", "alteverhs", "mthss6", "rlass6", "mthss8", "rlass8", "pctabshs", "pctexcusedhs", "aptakenever", "lastobsyr", "transfer", "dropout", "stillenrolled", "hsgpa", "gradontime", "gradcohort", "diploma", "evercollrdyhs", "evercarrdyhs", "actmth11", "actrla11", "acteng11", "actcmp11", "yr1psenr2yr", "yr1psenr4yr", "yr1psenrany", "yr2psenrany"))

The second call to synthpop should sample school identifiers and then start modeling student level attributes. It fails consistently. It is only using a single core, even though the machine has 12 available and doesn't use all of the RAM available.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-1502239387, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7DHZI5WNGKLX3S364TXARQIZANCNFSM5ARGUS4Q. You are receiving this because you commented.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

wbuchanan commented 1 year ago

I did receive your email and am all ears for a solution.Thanks for the quick response and I hope my example might be useful for some of the other similar issues others have raised.

gillian-raab commented 1 year ago

Dear William, Here is a revised and edited version of your code. You should be able to rerun your synthesis by one of two methods. Here are somerelevant comments.

Sometimes CART models can get stuck. This is what seems to have happened in your case when you used the default cart method used in synthpop which picks up the rpart function from the rpart package. Usually this is caused by some curious patterns in the data often involving small numbers. In this sort of case I wouldt alter the model in some way to see if I can get the synthesis to run, My first try is usually to try another version of CART. This worked fine here. There is another cart option that uses the ctree function ( I think it comes from the party package). When something goes wrong I usually just try the other routine, as this often cures it. This worked here asnd the code to create syn4 does it. I prefer the ctree method in general because it provides nice model plots (see code). We don't have it as the default because it too can occasionally fail with a cryptic error message. The package authors have not been able to help with this.

But I went a bit further to see why rpart had gone wrong. It can often be due to small numbers and/or exact dependencies between variables in the initial models. In your case it was because altsh is actually at the school level, and the variable male has only 2 missing values. I messed around a bit to see what would work. Changing the order of synthesis can usually cure this. Moving the school id variable to the end of the synthesis allowed this to work OK. (syn5)

I don't know what you wanted to put this in first. There are several reasons why this is a bad idea.

  1. The schid divides the data into subgroups many of which are small. This means they will be unlikely toappear in tree based models.
  2. Altsch is derived from schid
  3. The dependency of aone variable on others does not require it to be synthesisied first. You can see this if you look at the model for schid that the code prints out for syn5.

Where are you and what are you using our package for? We are always interested to know.

Best wishes

Gillian

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days +44 07748 678 551


From: William Buchanan @.> Sent: 11 April 2023 12:21 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

I did receive your email and am all ears for a solution.Thanks for the quick response and I hope my example might be useful for some of the other similar issues others have raised.Sent from my iPhoneOn Apr 11, 2023, at 06:57, Gillian Raab @.***> wrote: DearWilliam, I have doiagnosed your problem and can offer a couple of solutions. Before I supply you with detailsa can you please let me know if you get this. Then I can send you code and/or, we could have a chat about how you are running your synthesis. I can offer someuggestions on how you might improve it.

Best wishes

Gillian

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days 07748 678 551


From: William Buchanan @.> Sent: 10 April 2023 20:52 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Comment @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Using Version 1.8-0 R 4.2.3

set.seed(7779311) library(haven) library(dplyr) filenm <- "https://github.com/OpenSDP/faketucky/raw/master/faketucky.dta" df <- haven::read_dta(filenm, col_select = c("sid", "first_dist_code", "first_hs_code", "first_hs_alt", "first_hs_urbanicity", "chrt_ninth", "male", "race_ethnicity", "frpl_ever_in_hs", "sped_ever_in_hs", "lep_ever_in_hs", "gifted_ever_in_hs", "ever_alt_sch_in_hs", "scale_score_6_math", "scale_score_6_read", "scale_score_8_math", "scale_score_8_read", "pct_absent_in_hs", "pct_excused_in_hs", "avg_gpa_hs", "scale_score_11_eng", "scale_score_11_math", "scale_score_11_read", "scale_score_11_comp", "collegeready_ever_in_hs", "careerready_ever_in_hs", "ap_ever_take_class", "last_acadyr_observed", "transferout", "dropout", "still_enrolled", "ontime_grad", "chrt_grad", "hs_diploma", "enroll_yr1_any", "enroll_yr1_2yr", "enroll_yr1_4yr", "enroll_yr2_any")) names(df) <- c("stdid", "distid", "schcd", "altsch", "urbanicity", "cohort", "male", "race", "frleverhs", "swdeverhs", "eleverhs", "tageverhs", "alteverhs", "mthss6", "rlass6", "mthss8", "rlass8", "pctabshs", "pctexcusedhs", "hsgpa", "acteng11", "actmth11", "actrla11", "actcmp11", "evercollrdyhs", "evercarrdyhs", "aptakenever", "lastobsyr", "transfer", "dropout", "stillenrolled", "gradontime", "gradcohort", "diploma", "yr1psenrany", "yr1psenr2yr", "yr1psenr4yr", "yr2psenrany") df$schid <- paste0(df$distid, df$schcd) validSchools <- data.frame("schid" = sample(unique(df$schid), size = 60)) df <- dplyr::inner_join(df, validSchools) df$altsch <- as.factor(df$altsch) df$cohort <- as.factor(df$cohort) df$male <- as.factor(df$male) df$swdeverhs <- as.factor(df$swdeverhs) df$eleverhs <- as.factor(df$eleverhs) df$schid <- as.factor(df$schid) df$tageverhs <- as.factor(df$tageverhs) df$alteverhs <- as.factor(df$alteverhs) df$evercollrdyhs <- as.factor(df$evercollrdyhs) df$evercarrdyhs <- as.factor(df$evercarrdyhs) df$aptakenever <- as.factor(df$aptakenever) df$transfer <- as.factor(df$transfer) df$dropout <- as.factor(df$dropout) df$stillenrolled <- as.factor(df$stillenrolled) df$gradontime <- as.factor(df$gradontime) df$diploma <- as.factor(df$diploma) df$yr1psenrany <- as.factor(df$yr1psenrany) df$yr1psenr2yr <- as.factor(df$yr1psenr2yr) df$yr1psenr4yr <- as.factor(df$yr1psenr4yr) df$yr2psenrany <- as.factor(df$yr2psenrany) df$schid <- as.factor(df$schid) df$race <- as.factor(df$race) df$urbanicity <- as.factor(df$urbanicity) df$frleverhs <- as.factor(df$frleverhs) df$lastobsyr <- as.factor(df$lastobsyr) df$gradcohort <- as.factor(df$gradcohort) df <- df[-c(2, 3)] library(synthpop)

This works fine and executes relatively quickly

syn <- synthpop::syn(df)

This freezes and fails to execute every time:

syn2 <- synthpop::syn(df[-c(1)], models = TRUE, visit.sequence = c("schid", "altsch", "male", "race", "cohort", "urbanicity", "frleverhs", "swdeverhs", "eleverhs", "tageverhs", "alteverhs", "mthss6", "rlass6", "mthss8", "rlass8", "pctabshs", "pctexcusedhs", "aptakenever", "lastobsyr", "transfer", "dropout", "stillenrolled", "hsgpa", "gradontime", "gradcohort", "diploma", "evercollrdyhs", "evercarrdyhs", "actmth11", "actrla11", "acteng11", "actcmp11", "yr1psenr2yr", "yr1psenr4yr", "yr1psenrany", "yr2psenrany"))

The second call to synthpop should sample school identifiers and then start modeling student level attributes. It fails consistently. It is only using a single core, even though the machine has 12 available and doesn't use all of the RAM available.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-1502239387, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7DHZI5WNGKLX3S364TXARQIZANCNFSM5ARGUS4Q. You are receiving this because you commented.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-1503150097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7C3JUNDPV5CBRRWOM3XAU5FDANCNFSM5ARGUS4Q. You are receiving this because you commented.Message ID: @.***>

wbuchanan commented 1 year ago

Hi @gillian-raab,

I included the school ID variable first purposefully to sample school IDs (hopefully in a manner that would retain the marginal distribution of school IDs). I intially had school level variables in the visit sequence listed first, followed by demographic characteristics of students, and then test scores and outcomes. In terms of use, it is purely for demonstration purposes to explain how synthetic data can be used for privacy protection to increase access to data, for this particular example.

That said, I didn't see any other code listed, but can at least try making some of the modifications you mentioned.

gillian-raab commented 1 year ago

Apologies William, the code appears to have failed to attach. Here it is now. It is not usually necessary to put a variable at the start of the synthesis to maintain the marginal distributions.

Good luck with using synthpop.

Gillian

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days 07748 678 551


From: William Buchanan @.> Sent: 11 April 2023 15:33 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Mention @.> Subject: Re: [bnowok/synthpop] RStudio Freezing When Dataframe Includes Factor (#18)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Hi @gillian-raabhttps://github.com/gillian-raab,

I included the school ID variable first purposefully to sample school IDs (hopefully in a manner that would retain the marginal distribution of school IDs). I intially had school level variables in the visit sequence listed first, followed by demographic characteristics of students, and then test scores and outcomes. In terms of use, it is purely for demonstration purposes to explain how synthetic data can be used for privacy protection to increase access to data, for this particular example.

That said, I didn't see any other code listed, but can at least try making some of the modifications you mentioned.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/18#issuecomment-1503484255, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7H2GTGYOGNFXFA765LXAVTS7ANCNFSM5ARGUS4Q. You are receiving this because you were mentioned.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.