Open jhseeman opened 3 months ago
I would appreciate functionality/best practices for working with continuous variables and mixed-type data.
Can you share a little more detail about the linkage attack functionality?
I've done some crude work on this. Let me know how I can help! The discriminator workflow we added is pretty flexible and leverages library(tidymodels)
.
@awunderground I updated this roadmap based on what was merged in. If you have some crude work done already on attribute inferences, any chance you'd be willing to add it to a branch? I can massage it to work with the 0.0.4
updates; I think this will be pretty flexible since it should probably take a tidymodels
workflow as input
Disclosure risk metrics planning
This issue will be used to plan updates for disclosure risk metrics in
syntheval
Confidential data baseline assessments
disc_baseline.R
disc_baseline_lra(conf_tables)
: linear reconstruction attack from a collection of count tables (link)disc_baseline_make_canaries(conf_data)
: create artificial high-risk records for holdout data (e.g., "canaries" (link)Membership inferences from synthetic data
disc_qid_mi.R
)disc_mit(...)
updates for multiple synthetic data replicatesdisc_mit(...)
updates for disaggregated recordsdisc_mit(...)
updates for mechanism adaptivity (edit: deferred to 0.0.5)disc_linkage_recon(synth_data, recon)
: Linkage attack from synthetic data and partial reconstructionAttribute inferences
disc_ait(synth_data, test_records)
: attribute inference fortest_records
using synthetic data-based modelsdisc_ait_compare(synth_data, test_records, holdout_data)
: attribute inference fortest_records
comparing differences between using synthetic and holdout data (link)