Open MaxGhenis opened 5 years ago
Extremely helpful. Very much on point. I have put it in our Zotero group (see #2). My main question is trying to understand what a true match is for them. Will try to add questions to the Google doc.
There is a lot of disclosure-related commentary going on in Twitter, particularly at: https://twitter.com/ianschmutte https://twitter.com/larsvil https://twitter.com/john_abowd
Of possible interest, in Zotero:
Benedetto, Stinson, and Abowd (2013): The Creation and Use of the SIPP Synthetic Beta has a bit more detail on the process, but doesn't address the key question of whether the concept of a "true" match between synthetic and actual files is useful when they only share gender and marital status. Here's a Google Doc if you want to comment. More:
The Census Bureau has used synthetic data in the past, for example in producing a synthetic SIPP. I reviewed some of their materials, here are some highlights:
Benedetto and Stinson (2015)
This describes methodology and metrics used to assess disclosure risk. Given its relevance, I created a Google Docs copy with comments.
Synthesis
Disclosure risk
Thoughts
Since we're not starting with real records in the same way as they are, we don't have a "true" mapping to base disclosure risk metrics on. But the idea of quantifying the likelihood of correctly inferring data based on what's available is intriguing, and could help move from less-interpretable distance metrics to more-interpretable probabilities.
Jarmin, Louis, Miranda (2014)
This lacks specifics that would be useful for our project.
If anyone knows more about what the Census does please share here.