NIHOPA / NLPre

Python library for Natural Language Preprocessing (NLPre)
188 stars 34 forks source link

Add unit tests to parenthetical phrases #12

Closed thoppe closed 7 years ago

thoppe commented 7 years ago

In the new module (from #11), please add unit tests. I think there will be a lot of cases that won't work that should (hence the tests!). Example of a known issue, "and":

The Environmental Protection Agency (EPA) is not a government organization (GO) of 
Health and Human Services (HHS).

Finds only

Counter({(('government', 'organization'), 'GO'): 1, (('Environmental', 'Protection', 'Agency'), 'EPA'): 1})
thoppe commented 7 years ago

A list of currently found abbreviations:

abbreviations.csv.zip

Sample

Alzheimers disease,AD,3939
reactive oxygen species,ROS,3711
central nervous system,CNS,2595
magnetic resonance imaging,MRI,2214
endoplasmic reticulum,ER,2030
genome wide association studies,GWAS,1964
Parkinsons disease,PD,1843
nitric oxide,NO,1710
wild type,WT,1507
multiple sclerosis,MS,1411
blood brain barrier,BBB,1336
single nucleotide polymorphisms,SNPs,1298
heart failure,HF,1242
magnetic resonance imaging,fMRI,1162
chronic kidney disease,CKD,1081
traumatic brain injury,TBI,1020
positron emission tomography,PET,1017
autism spectrum disorders,ASD,1000
dendritic cells,DCs,982
vascular endothelial growth factor,VEGF,967
chronic obstructive pulmonary disease,COPD,956
prostate cancer,PCa,940
myocardial infarction,MI,926
diffusion tensor imaging,DTI,912
randomized controlled trial,RCT,896
bone marrow,BM,863
body mass index,BMI,848
endothelial cells,ECs,828
Toll like receptor,TLR,825