jhseeman commented 3 months ago

Disclosure risk metrics planning

This issue will be used to plan updates for disclosure risk metrics in syntheval

Confidential data baseline assessments

[X] Methods for identifying existing confidential records with high disclosure risk (edit: now in disc_baseline.R
[ ] Methods for identifying arbitrary records worth evaluating in holdouts (edit: deferred to 0.0.5)
- [ ] disc_baseline_lra(conf_tables): linear reconstruction attack from a collection of count tables (link)
- [ ] disc_baseline_make_canaries(conf_data): create artificial high-risk records for holdout data (e.g., "canaries" (link)

Membership inferences from synthetic data

[X] Quasi-identifier probabilistic membership inference (edit: added in disc_qid_mi.R)
- [X] Partition selection probabilities from multiple replicates
- [X] Membership empirical intervals from multiple replicates
[X] Membership inference updates for arbitrarily holdouts (link)
- [X] disc_mit(...) updates for multiple synthetic data replicates
- [X] disc_mit(...) updates for disaggregated records
- [ ] disc_mit(...) updates for mechanism adaptivity (edit: deferred to 0.0.5)
[ ] Linkage attacks (edit: deferred to 0.0.5)
- [ ] disc_linkage_recon(synth_data, recon): Linkage attack from synthetic data and partial reconstruction

Attribute inferences

[ ] disc_ait(synth_data, test_records): attribute inference for test_records using synthetic data-based models
[ ] disc_ait_compare(synth_data, test_records, holdout_data): attribute inference for test_records comparing differences between using synthetic and holdout data (link)

awunderground commented 3 months ago

Confidential data baseline assessments

I would appreciate functionality/best practices for working with continuous variables and mixed-type data.

Membership inferences from synthetic data

Can you share a little more detail about the linkage attack functionality?

I can imagine major differences between methods for partially and fully synthetic data.
What is the direction of the linkage?

Attribute inferences

I've done some crude work on this. Let me know how I can help! The discriminator workflow we added is pretty flexible and leverages library(tidymodels).

jhseeman commented 5 days ago

@awunderground I updated this roadmap based on what was merged in. If you have some crude work done already on attribute inferences, any chance you'd be willing to add it to a branch? I can massage it to work with the 0.0.4 updates; I think this will be pretty flexible since it should probably take a tidymodels workflow as input

UrbanInstitute / syntheval

Disclosure risk metric planning / tracking #87

Disclosure risk metrics planning

Confidential data baseline assessments

Membership inferences from synthetic data

Attribute inferences

Confidential data baseline assessments

Membership inferences from synthetic data

Attribute inferences