Open LotteVanUtrecht opened 2 years ago
Can I suggest you try this. Put these two variables at the start of your synthesis, For these two variables use the method "catall", Define the empty cells as structural zeros - see the catall documentation for how to do it. Then synthesise the rest of your variables as usual.
Good luck and let me know if this works. Gillian
Gillian M Raab
Emeritus Professor, Edinburgh Napier University
Part-time Research Fellow
Administrative Data Research Centre - Scotland
Edinburgh
+44 7748 678 551
From: LotteVanUtrecht @.> Sent: 15 June 2022 15:24 To: bnowok/synthpop @.> Cc: Subscribed @.***> Subject: [bnowok/synthpop] Feature request: restrict combinations of values in the synthetic data to combinations appearing in the real data/ (Issue #22)
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
We are synthesizing a dataset with two related variables: "onderwijsstructuur" & "owsoort" (which in this case indicate information about a school and an individual student respectively). We would like the synthetic data to only include combinations of those two variables that are present in the real data. Part of the crosstable between variables is included below.
If you only look at the second row (the case where "onderwijsstructuur"=="HAVO"), this problem is easily solved. Just give syn() a rule and rvalue that looks something like this: params[["rules"]] <- list(owsoort='"onderwijsstructuur"=="HAVO"') params[["rvalues"]] <- list(owsoort='HAVO')
However, when we want to include the cases in the fourth row (where "onderwijsstructuur"=="MAVO"), we run into two problems:
It's possible that you can already construct a good alternative with the current features of the package and we just overlooked that. For some cases, synthesizing the two variables together with the 'catall' is a good alternative. However, that will not work here, as "onderwijsstructuur" is already synthesized together with other variables and we feel that including "owsoort" in there would take too much personal information from single individuals.
Best, Lotte
— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7GRGWUCOAR2PVEAUXDVPHRTFANCNFSM5Y3QHSZA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
I've just read your email more carefully and I see that you have already thought about the catall option. Do you have to synthesise "onderwijsstructuur" first. Could it not come later? Best Gillian
Gillian M Raab
Emeritus Professor, Edinburgh Napier University
Part-time Research Fellow
Administrative Data Research Centre - Scotland
Edinburgh
+44 7748 678 551
From: LotteVanUtrecht @.> Sent: 15 June 2022 15:24 To: bnowok/synthpop @.> Cc: Subscribed @.***> Subject: [bnowok/synthpop] Feature request: restrict combinations of values in the synthetic data to combinations appearing in the real data/ (Issue #22)
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
We are synthesizing a dataset with two related variables: "onderwijsstructuur" & "owsoort" (which in this case indicate information about a school and an individual student respectively). We would like the synthetic data to only include combinations of those two variables that are present in the real data. Part of the crosstable between variables is included below.
If you only look at the second row (the case where "onderwijsstructuur"=="HAVO"), this problem is easily solved. Just give syn() a rule and rvalue that looks something like this: params[["rules"]] <- list(owsoort='"onderwijsstructuur"=="HAVO"') params[["rvalues"]] <- list(owsoort='HAVO')
However, when we want to include the cases in the fourth row (where "onderwijsstructuur"=="MAVO"), we run into two problems:
It's possible that you can already construct a good alternative with the current features of the package and we just overlooked that. For some cases, synthesizing the two variables together with the 'catall' is a good alternative. However, that will not work here, as "onderwijsstructuur" is already synthesized together with other variables and we feel that including "owsoort" in there would take too much personal information from single individuals.
Best, Lotte
— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7GRGWUCOAR2PVEAUXDVPHRTFANCNFSM5Y3QHSZA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
We are synthesizing a dataset with two related variables: "onderwijsstructuur" & "owsoort" (which in this case indicate information about a school and an individual student respectively). We would like the synthetic data to only include combinations of those two variables that are present in the real data. Part of the crosstable between variables is included below.
If you only look at the second row (the case where "onderwijsstructuur"=="HAVO"), this problem is easily solved. Just give syn() a rule and rvalue that looks something like this: params[["rules"]] <- list(owsoort='"onderwijsstructuur"=="HAVO"') params[["rvalues"]] <- list(owsoort='HAVO')
However, when we want to include the cases in the fourth row (where "onderwijsstructuur"=="MAVO"), we run into two problems:
It's possible that you can already construct a good alternative with the current features of the package and we just overlooked that. For some cases, synthesizing the two variables together with the 'catall' is a good alternative. However, that will not work here, as "onderwijsstructuur" is already synthesized together with other variables and we feel that including "owsoort" in there would take too much personal information from single individuals.
Best, Lotte