BCHSI / philter-ucsf

Open source clinical text de-identification
BSD 3-Clause "New" or "Revised" License
107 stars 50 forks source link

Issues reproducing Precision/Recall/F1/F2 on the i2b2 dataset #11

Open soulaven opened 3 years ago

soulaven commented 3 years ago

Hi,

Thank you for the development and release of this package. I followed the steps 0, 2a, 1b, 1c using the PHI config file, and then 2d with prod=True. In calculation of the scores and following my understanding of the paper, I separated all PHI text on the word level including sanitizing for edge cases such as "," and "." at the end of words (otherwise the stats are much lower). However, I was only able to achieve Precision 0.696 Recall 0.915 F1 0.791 F2 0.861 on the test set, which is some way away from the statistics reported on the i2b2 test set in the paper. I think I am most likely missing something, but am unsure what it is.

RedChrists commented 2 years ago

In addition to step 0, a manual review of the results may be necessary to confirm that the missed i2b2 tags are actual PHI according to HIPAA.