Open davidchan-emtelligent opened 6 years ago
The classifier model has:
confusion matrix:
comma period
[[1177 126]
[ 221 4740]]
Accuracy: 94.46%
We have a not good result when this classifier is used. As @oconnelltim pointed out:
'Newly diagnosed GBM as above
otherwise, none'
gets turned into:
1. Newly diagnosed GBM as above.
2. otherwise.
3. none.
Comma and semi-colon as separators are important. They take > 10% (26257/228504) in our data set. It seems better to not use it. I'am keeping the code inside repunct and can remove it after a while.
The rules to use a comma/semi-colon as separator are:
Case 1:
No peroid. For example:
original_file_idx:97974, 141-----------(input) PMH:
Amyloidosis, depression, kidney stones, hx of tubal ligation, L
hip replacement
------------(output) # PMH:
Case 2:
Only one peroid at the end. For example:
original_file_idx:97972, 137------------(input) Amyloidosis, depression, kidney
stones.
------------(output)
original_file_idx:97972, 137------------(input) Amyloidosis, depression, kidney
stones. ------------(output)
[' \*\*]\n[\*\*Last Name (NamePattern0) \*\*]\nHRS/[\*\*Last Name (un) 00\*\*]\nMalnutrition\n\n\nDischarge Condition:\nMental Stat']
[' \*\*]\n[\*\*Last Name (NamePattern0) \*\*]\nHRS/Malnutrition\n\n\nDischarge Condition:\nMental Stat']
['First Name0 (LF) 0000\*\*] S. MD\nLocation: [\*\*Last Name (un) \*\*] DIABETES CENTER\nAddress: ONE [\*\*Last Name (']
['First Name0 (LF) 0000\*\*] S. MD\nLocation: DIABETES CENTER\nAddress: ONE [\*\*Last Name (']
['000\*\*] with subsequent implantation of a\n[\*\*Company 0000\*\*] [\*\*Last Name (un) \*\*] ICD with a Sprint Fidelis 0000 lead, w']
['000\*\*] with subsequent implantation of a\nICD with a Sprint Fidelis 0000 lead, w']
['OURSE: On [\*\*0-0\*\*], he underwent an LV/[\*\*Name Prefix (Prefixes) \*\*]\n[\*\*Last Name (Prefixes) 0000\*\*] ICD lead placement and a generator exc']
['OURSE: On [\*\*0-0\*\*], he underwent an LV/ICD lead placement and a generator exc']
[' History of cellulitis.\n00. History of [\*\*Last Name (un) 00000\*\*] syndrome.\n\nMEDICATIONS ON ADMISSION: Medic']
[' History of cellulitis.\n00. History of syndrome.\n\nMEDICATIONS ON ADMISSION: Medic']
['e Renal Failure\nAtrial Fibrillation with [\*\*First Name0 (NamePattern0) 0000\*\*]\n[\*\*Last Name (un) 00000\*\*] Syndrome / Colonic Pseudoobstruction\nRenal ']
['e Renal Failure\nAtrial Fibrillation with Syndrome / Colonic Pseudoobstruction\nRenal ']
['long\nthe spectrum of erythema multiforme/[\*\*First Name0 (NamePattern0) 000\*\*] [\*\*Last Name (NamePattern0) 0000\*\*] syndrome.\nHSV culture of the erosions prese']
['long\nthe spectrum of erythema multiforme/syndrome.\nHSV culture of the erosions prese']
['woman with acute mental status changes s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*] aneurysm repair,\n history of left sided stroke no']
['woman with acute mental status changes s/aneurysm repair,\n history of left sided stroke no']
[' no\n diaphraghmatic defect seen.\n 0. s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] gastric bypass and cholecystectomy.\n ____________']
[' no\n diaphraghmatic defect seen.\n 0. s/gastric bypass and cholecystectomy.\n ____________']
['has a [\*\*Name (NI) 00000\*\*] syndrome and [\*\*Last Name (un) 00000\*\*]\nsyndrome. She was initially found to have a']
['has a [\*\*Name (NI) 00000\*\*] syndrome and syndrome. She was initially found to have a']
['Radiology) 00000\*\*]\n Reason: concern for [\*\*Last Name (un) 00000\*\*] gangrene\n _________________________________']
['Radiology) 00000\*\*]\n Reason: concern for gangrene\n _________________________________']
['YING MEDICAL CONDITION:\n ~[\*\*00-00\*\*] s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 000\*\*] angioplasty, SMA angioplasty, Ex-lap and small']
['YING MEDICAL CONDITION:\n ~[\*\*00-00\*\*] s/angioplasty, SMA angioplasty, Ex-lap and small']
['cement with 00mm SJM Biocor Tissue valve/[\*\*Name Prefix (Prefixes) \*\*]\n[\*\*Last Name (Prefixes) 0000\*\*] resection on [\*\*0000-0-00\*\*]\n\nHistory of Pre']
['cement with 00mm SJM Biocor Tissue valve/resection on [\*\*0000-0-00\*\*]\n\nHistory of Pre']
['arcinomas\n# Congestive heart failure\n# H/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] infection\n\n\nSocial History:\nMarried and has ']
['arcinomas\n# Congestive heart failure\n# H/infection\n\n\nSocial History:\nMarried and has ']
['c gland polyp.\n\nOSH: [\*\*0000-0-00\*\*] EGD/[\*\*Last Name (un) 0000\*\*] procedure report:\n-EGD: small hiatal hernia,']
['c gland polyp.\n\nOSH: [\*\*0000-0-00\*\*] EGD/procedure report:\n-EGD: small hiatal hernia,']
[' Tessio cath. He also developed necrotic/[\*\*Last Name (un) 00000\*\*] gangrene\nof toes b/l s/p surgery, no interv']
[' Tessio cath. He also developed necrotic/gangrene\nof toes b/l s/p surgery, no interv']
['ystolic Congestive Heart failure: no ACE/[\*\*Last Name (un) \*\*]\n[\*\*0-00\*\*] [\*\*Last Name (un) \*\*]\nAcute renal failure\nAortic Stenosis\nAnemia\nObstuctive ']
['ystolic Congestive Heart failure: no ACE/Acute renal failure\nAortic Stenosis\nAnemia\nObstuctive ']
['nsion\nhypercholestremia\nchronic low back [\*\*Last Name (un) 00000\*\*]\npeptic ulcer disease by EGD\ndiverticulosis\n\n\nDi']
['nsion\nhypercholestremia\nchronic low back peptic ulcer disease by EGD\ndiverticulosis\n\n\nDi']
['me 0000\*\*] is a 00 year old woman with h/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*] syndrome,\nshort gut syndrome, prior line in']
['me 0000\*\*] is a 00 year old woman with h/syndrome,\nshort gut syndrome, prior line in']
['cirrhosis s/p liver-kidney transplant, h/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*]\n osteomyelitis, now w/ CNS/strep viridian bacte']
['cirrhosis s/p liver-kidney transplant, h/ osteomyelitis, now w/ CNS/strep viridian bacte']
['cal atrophy,\nneurogenic bladder plus Shy-[\*\*Last Name (un) 00000\*\*] syndrome who was visiting\n[\*\*State 000\*\*] w']
['cal atrophy,\nneurogenic bladder plus Shy-syndrome who was visiting\n[\*\*State 000\*\*] w']
['edure.\n REASON FOR THIS EXAMINATION:\n R/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 000\*\*] abscess. Need drainage if abscess found.\n ']
['edure.\n REASON FOR THIS EXAMINATION:\n R/abscess. Need drainage if abscess found.\n ']
['gen elevated. Vitamin K was given.\n.\n# S/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] resection. Pain regimen. Surgery was consult']
['gen elevated. Vitamin K was given.\n.\n# S/resection. Pain regimen. Surgery was consult']
['.D. [\*\*MD Number(0) 0000\*\*]\n\nDictated By:[\*\*Last Name (NamePattern0) 00000\*\*]\nEGD report([\*\*0000-0-00\*\*]):\nFindings:']
['.D. [\*\*MD Number(0) 0000\*\*]\n\nDictated By:EGD report([\*\*0000-0-00\*\*]):\nFindings:']
['ry Failure\nPulmonary Edema\nRenal Failure\n[\*\*Last Name (un) 00000\*\*] Syndrome\nAtrial Fibrillation\n\n","Admission ']
['ry Failure\nPulmonary Edema\nRenal Failure\nSyndrome\nAtrial Fibrillation\n\n","Admission ']
Repunct only can deal with single parenthesis () with a max. length of 185 chars (can be assigned any length) and the nested parentheses will be failed to be detected. For example,
One thing we can do is to tell Repunct (NamePattern?) is not a valid parenthesis. The result is:
The change is in issue1_2_3.