CC-HIC / ccanonym

Critical care data anonymisation package
1 stars 0 forks source link

first release #14

Open sinanshi opened 7 years ago

sinanshi commented 7 years ago

k = [5, 10] l = 0.32

l-diversity using entropy diversity. Entropy l-diversity – The most complex definition defines Entropy of an equivalent class E to be the negation of summation of s across the domain of the sensitive attribute of p(E,s)log(p(E,s)) where p(E,s) is the fraction of records in E that have the sensitive value s. A table has entropy l-diversity when for every equivalent class E, Entropy(E) ≥ log(l).

https://files.slack.com/files-pri/T0BR4BM7H-F35ATC2GJ/2014.pdf

sinanshi commented 7 years ago

Hi Steve, The extracts (k=[1,10]) are ready in /data/anon. These are only the sdc objects. The real extraction takes too long time since they have to do deltaTime for the entire dataset. But they will appear gradualy in /data.

The sdc object contains more information. sdc$data is the episode table and sdc$sdc is the sdc object which tells more about the SDC operations and its effect.