glassonion1 / anonypy

Anonymization library for python. Protect the privacy of individuals.
MIT License
25 stars 10 forks source link
k-anonymity l-diversity mondrian pandas python python3 t-closeness

AnonyPy

Anonymization library for python. AnonyPy provides following privacy preserving techniques for the anonymization.

The Anonymization method

Install

$ pip install anonypy

Usage

import anonypy
import pandas as pd

data = [
    [6, "1", "test1", "x", 20],
    [6, "1", "test1", "x", 30],
    [8, "2", "test2", "x", 50],
    [8, "2", "test3", "w", 45],
    [8, "1", "test2", "y", 35],
    [4, "2", "test3", "y", 20],
    [4, "1", "test3", "y", 20],
    [2, "1", "test3", "z", 22],
    [2, "2", "test3", "y", 32],
]

columns = ["col1", "col2", "col3", "col4", "col5"]
categorical = set(("col2", "col3", "col4"))

df = pd.DataFrame(data=data, columns=columns)

for name in categorical:
  df[name] = df[name].astype("category")

feature_columns = ["col1", "col2", "col3"]
sensitive_column = "col4"

p = anonypy.Preserver(df, feature_columns, sensitive_column)
rows = p.anonymize_k_anonymity(k=2)

dfn = pd.DataFrame(rows)
print(dfn)

Original data

   col1 col2   col3 col4  col5
0     6    1  test1    x    20
1     6    1  test1    x    30
2     8    2  test2    x    50
3     8    2  test3    w    45
4     8    1  test2    y    35
5     4    2  test3    y    20
6     4    1  test3    y    20
7     2    1  test3    z    22
8     2    2  test3    y    32

The created anonymized data is below(Guarantee 2-anonymity).

  col1 col2         col3 col4  count
0  2-4    2        test3    y      2
1  2-4    1        test3    y      1
2  2-4    1        test3    z      1
3  6-8    1  test1,test2    x      2
4  6-8    1  test1,test2    y      1
5    8    2  test3,test2    w      1
6    8    2  test3,test2    x      1

Publish PyPI

$ python -m pip install hatchling wheel twine
$ python -m build --wheel .
$ python -m twine upload dist/*