BCHSI / philter-ucsf

Open source clinical text de-identification
BSD 3-Clause "New" or "Revised" License
107 stars 50 forks source link

De-identified notes not outputting to specific directory #2

Closed kmuenzen closed 2 years ago

bayan6060 commented 4 years ago

I have some xml files from i2b2 data sets. When trying to convert xml files into plain text and annotated text files, I just get plain text files without any phi annotated. Moreover, the JSON file inside the data folder does not have any phi. My xml file does not have anything between <TAGS? <\TAGS>. How can i generate phi tags inside it.? Thanks

kmuenzen commented 4 years ago

@bayan6060 If you use the following command, the annotated (i.e. PHI-obscured) text files should output into the directory specified by the -a flag:

python3 ./generate_dataset/main_ucsf_updated.py -x ./data/i2b2_xml/ -o ./data/phi_notes_i2b2.json -n ./data/i2b2_notes/ -a ./data/i2b2_anno/

If there are no PHI tags specified in your input XML files (or the tags are not formatted correctly), the notes in the annotated folder will appear to be unannotated since there is technically no PHI to obscure.

Have you made sure that the tag format of your input XML files is the same as those in the example data/i2b2_xml/ folder in this repository?

RedChrists commented 2 years ago

Closing due to inactivity.