UBRoswellMedPhys / rpc3dl

Deep learning for radiotherapy outcome prediction
1 stars 0 forks source link

Create anonymization module #1

Closed jasbach closed 1 year ago

jasbach commented 1 year ago

In order to support HIPAA compliance, develop one-way anonymization module (outcomeDL.anon) that can process DICOM files alongside CSVs containing patient characteristic data and label data. The intent is to be able to run a program that will anonymize data from all sources simultaneously, assigning them a randomized AnonID that retains data mappings without creating or saving any sort of mapping key. This way, the anonymization is one-way. (might consider writing capability to also perform a reversible anonymization, e.g. have a mode that also saves a key mapping, even if I would not use it for this project)

jasbach commented 1 year ago

Finished script that can anonymize individual files or recursively walk through a parent folder and anonymize all contents to a destination folder. Mirrors folder structure of source folder into the destination folder, with the exception of any MRN/IDs in folder names - these are anonymized as well.

Also processes CSVs containing supporting data such as labels or survey responses. This assumes that the CSV fields are sufficient scrubbed of PII - the only field that is altered in the CSVs is the MRN/ID field.

I have not yet written the ability to actually save the mapping, if that's desired - this is not critical as for our research we don't want to save the mapping, we want it to be a fully one-way, irreversible anonymization.