Add possibility to handle new patient pseudonyms and day shifts generated by PACSMAN's `get_pseudonyms.py`

TranslationalML / tml-ctp

Project to depersonalize medical imaging data

Mozilla Public License 2.0

2 stars 0 forks source link

Add possibility to handle new patient pseudonyms and day shifts generated by PACSMAN's `get_pseudonyms.py` #1

Closed sebastientourbier closed 7 months ago

sebastientourbier commented 7 months ago

Following a discussion with Frederic, we would like to implement the current workflow that would fit all his needs in practice:

flowchart TB
    id1([PACS])
    id2(pacsman --save)
    id31(get_pseudonyms -m de-id --queryfile)
    id32(get_pseudonyms -m custom --mappingfile)
    id4(tml-ctp)
    id5(add_karnak_tags)
    id6(pacsman --upload)
    id7([Karnak / Kheops])
    id1 --> id2
    id2 --> id31
    id2 --> id32
    id31 --> id4
    id32 --> id4
    id4 --> id5
    id5 --> id6
    id6 --> id7

This would require the following code adjustments of tml-ctp:

take as input the new patient IDs and day shifts generated by PACSMAN's get_pseudonyms script
modify the PatientID and the DATEINC accordingly to the new patient ID and day shift

For now, the PatientID is fixed and after running CTP DAT is changed randomly. The Date increment in randomly generated and the script is modified prior to running CTP DAT.

jonasRichiardi commented 7 months ago

Yes this is great - this way we have build-in compatibility with the Karnak route and consistent pseudonyms and shifts, and we just need to deactivate the depersonalisation step in Karnak.

A few questions

The custom mappings would just be in the same JSON schema as the new_ids in PACSMAN? so we can validate directly and avoid errrors?
how do we keep consistency with GPCR/de-ID API patient codes if we have custom codes ? Do we just not call the API?

sebastientourbier commented 7 months ago

A few questions

1. The custom mappings would just be in the same JSON schema as the new_ids in PACSMAN? so we can validate directly and avoid errrors?

Yes, exactly :smiley:

2. how do we keep consistency with GPCR/de-ID API patient codes if we have custom codes ? Do we just not call the API?

Actually, Frederic received sometimes requests from researchers with their own custom mapping. In this scenario, he does not keep consistency with GPCR/de-ID API but instead uses the anonymize_dicoms script of PACSMAN. Following his practices, the --mode custom of get_pseudonyms will not call the de-ID API but instead it will just take as input the custom mappings described in a CSV file and format it accordingly the same JSON schema as the new_ids in PACSMAN. It should also generate the day_shift json. Both files will then be provided to our CTP DAT tool.