datasciencecampus / pprl_toolkit

The privacy-preserving record linkage toolkit: a proof-of-concept public demo of next-gen data linkage techniques.
https://datasciencecampus.github.io/pprl_toolkit/
MIT License
6 stars 1 forks source link

Method to anonymise EmbeddedDataFrame #10

Closed matweldon closed 4 months ago

SStock1 commented 4 months ago

@matweldon would you be able to add a description to this.

matweldon commented 4 months ago

The app already does this -- it would just be a convenience function to tidy up the API:

Basically a method that removes any columns except 'bf_indices', 'thresholds' and 'bf_norms' (and a user-provided list of cols to keep)

edf = EmbeddedDataFrame(df, embedder)
edf.anonymise()
# or edf.anonymise(cols_to_keep = ['idx']) to keep the 'idx' column
edf.to_json('path/to/file.json')