github2015david / gpu_funct

0 stars 1 forks source link

issue1 #4

Open davidchan-emtelligent opened 6 years ago

davidchan-emtelligent commented 6 years ago

Repunct only can deal with single parenthesis () with a max. length of 185 chars (can be assigned any length) and the nested parentheses will be failed to be detected. For example,

------------input------------original_file_idx:98499, 1375

AIDS (Dr. [**First Name8 (NamePattern2) **] [**Last Name (NamePattern1) 4049**] [**Hospital 1541**] Medical Center),

Hepatitis C, CD4 in 20's, viral load 190,000, thrombocytopenia,

recently seen by Hem/Onc at OSH, depression, hypertension,

ureteral implants, colposcopy, IV drug use, rotator cuff injury,


1. AIDS (Dr.
2. [**First Name8 (NamePattern2) **] [**Last Name (NamePattern1) 4049**] [**Hospital 1541**] Medical Center) , Hepatitis C, CD4 in 20's, viral load 190, 000, thrombocytopenia, recently seen by   Hem/Onc at OSH, depression, hypertension, ureteral implants, colposcopy, IV drug use, rotator cuff injury, gallstones.

One thing we can do is to tell Repunct (NamePattern?) is not a valid parenthesis. The result is:

1. AIDS (Dr. [**First Name8 (NamePattern2) **] [**Last Name (NamePattern1) 4049**] [**Hospital 1541**] Medical Center).
2. Hepatitis C.
3. CD4 in 20's.
4. viral load 190,000.
5. thrombocytopenia.
6. recently seen by Hem/Onc at OSH.
7. depression.
8. hypertension.
9. ureteral implants.
10. colposcopy.
11. IV drug use.
12. rotator cuff injury.
13. gallstones.

The change is in issue1_2_3.

davidchan-emtelligent commented 6 years ago

The classifier model has:

confusion matrix:
 comma period
[[1177  126]
 [ 221 4740]]
Accuracy: 94.46%

We have a not good result when this classifier is used. As @oconnelltim pointed out:

'Newly diagnosed GBM as above
otherwise, none' 
gets turned into:
1. Newly diagnosed GBM as above.
2. otherwise.
3. none.

Comma and semi-colon as separators are important. They take > 10% (26257/228504) in our data set. It seems better to not use it. I'am keeping the code inside repunct and can remove it after a while.

The rules to use a comma/semi-colon as separator are:

Case 1:

  1. n_comma/n_semi-colon > 0,
  2. No bullet point,
  3. No peroid. For example:

    original_file_idx:97974, 141-----------(input) PMH:

    Amyloidosis, depression, kidney stones, hx of tubal ligation, L

    hip replacement

    ------------(output) # PMH:

    1. Amyloidosis.
    2. depression.
    3. kidney stones.
    4. hx of tubal ligation.
    5. L hip replacement

Case 2:

  1. n_comma/n_semi-colon > 1,
  2. No bullet point,
  3. Only one peroid at the end. For example:

    original_file_idx:97972, 137------------(input) Amyloidosis, depression, kidney



    1. Amyloidosis.
    2. depression.
    3. kidney stones

    original_file_idx:97972, 137------------(input) Amyloidosis, depression, kidney

    stones. ------------(output)

    1. Amyloidosis.
    2. depression.
    3. kidney stones
davidchan-emtelligent commented 6 years ago
[' \*\*]\n[\*\*Last Name (NamePattern0) \*\*]\nHRS/[\*\*Last Name (un) 00\*\*]\nMalnutrition\n\n\nDischarge Condition:\nMental Stat']
[' \*\*]\n[\*\*Last Name (NamePattern0) \*\*]\nHRS/Malnutrition\n\n\nDischarge Condition:\nMental Stat']

['First Name0 (LF) 0000\*\*] S. MD\nLocation: [\*\*Last Name (un) \*\*] DIABETES CENTER\nAddress: ONE [\*\*Last Name (']
['First Name0 (LF) 0000\*\*] S. MD\nLocation: DIABETES CENTER\nAddress: ONE [\*\*Last Name (']

['000\*\*] with subsequent implantation of a\n[\*\*Company 0000\*\*] [\*\*Last Name (un) \*\*] ICD with a Sprint Fidelis 0000 lead, w']
['000\*\*] with subsequent implantation of a\nICD with a Sprint Fidelis 0000 lead, w']

['OURSE:  On [\*\*0-0\*\*], he underwent an LV/[\*\*Name Prefix (Prefixes) \*\*]\n[\*\*Last Name (Prefixes) 0000\*\*] ICD lead placement and a generator exc']
['OURSE:  On [\*\*0-0\*\*], he underwent an LV/ICD lead placement and a generator exc']

['  History of cellulitis.\n00.  History of [\*\*Last Name (un) 00000\*\*] syndrome.\n\nMEDICATIONS ON ADMISSION:  Medic']
['  History of cellulitis.\n00.  History of syndrome.\n\nMEDICATIONS ON ADMISSION:  Medic'] 

['e Renal Failure\nAtrial Fibrillation with [\*\*First Name0 (NamePattern0) 0000\*\*]\n[\*\*Last Name (un) 00000\*\*] Syndrome / Colonic Pseudoobstruction\nRenal ']
['e Renal Failure\nAtrial Fibrillation with Syndrome / Colonic Pseudoobstruction\nRenal ']

['long\nthe spectrum of erythema multiforme/[\*\*First Name0 (NamePattern0) 000\*\*] [\*\*Last Name (NamePattern0) 0000\*\*] syndrome.\nHSV culture of the erosions prese']
['long\nthe spectrum of erythema multiforme/syndrome.\nHSV culture of the erosions prese'] 

['woman with acute mental status changes s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*] aneurysm repair,\n  history of left sided stroke no']
['woman with acute mental status changes s/aneurysm repair,\n  history of left sided stroke no'] 

[' no\n  diaphraghmatic defect seen.\n  0. s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] gastric bypass and cholecystectomy.\n ____________']
[' no\n  diaphraghmatic defect seen.\n  0. s/gastric bypass and cholecystectomy.\n ____________']

['has a [\*\*Name (NI) 00000\*\*] syndrome and [\*\*Last Name (un) 00000\*\*]\nsyndrome. She was initially found to have a']
['has a [\*\*Name (NI) 00000\*\*] syndrome and syndrome. She was initially found to have a']

['Radiology) 00000\*\*]\n Reason: concern for [\*\*Last Name (un) 00000\*\*] gangrene\n _________________________________']
['Radiology) 00000\*\*]\n Reason: concern for gangrene\n _________________________________']

['YING MEDICAL CONDITION:\n  ~[\*\*00-00\*\*] s/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 000\*\*] angioplasty, SMA angioplasty, Ex-lap and small']
['YING MEDICAL CONDITION:\n  ~[\*\*00-00\*\*] s/angioplasty, SMA angioplasty, Ex-lap and small']

['cement with 00mm SJM Biocor Tissue valve/[\*\*Name Prefix (Prefixes) \*\*]\n[\*\*Last Name (Prefixes) 0000\*\*] resection on [\*\*0000-0-00\*\*]\n\nHistory of Pre']
['cement with 00mm SJM Biocor Tissue valve/resection on [\*\*0000-0-00\*\*]\n\nHistory of Pre']

['arcinomas\n# Congestive heart failure\n# H/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] infection\n\n\nSocial History:\nMarried and has ']
['arcinomas\n# Congestive heart failure\n# H/infection\n\n\nSocial History:\nMarried and has ']

['c gland polyp.\n\nOSH: [\*\*0000-0-00\*\*] EGD/[\*\*Last Name (un) 0000\*\*] procedure report:\n-EGD: small hiatal hernia,']
['c gland polyp.\n\nOSH: [\*\*0000-0-00\*\*] EGD/procedure report:\n-EGD: small hiatal hernia,'] 

[' Tessio cath. He also developed necrotic/[\*\*Last Name (un) 00000\*\*] gangrene\nof toes b/l s/p surgery, no interv']
[' Tessio cath. He also developed necrotic/gangrene\nof toes b/l s/p surgery, no interv']

['ystolic Congestive Heart failure: no ACE/[\*\*Last Name (un) \*\*]\n[\*\*0-00\*\*] [\*\*Last Name (un) \*\*]\nAcute renal failure\nAortic Stenosis\nAnemia\nObstuctive ']
['ystolic Congestive Heart failure: no ACE/Acute renal failure\nAortic Stenosis\nAnemia\nObstuctive ']

['nsion\nhypercholestremia\nchronic low back [\*\*Last Name (un) 00000\*\*]\npeptic ulcer disease by EGD\ndiverticulosis\n\n\nDi']
['nsion\nhypercholestremia\nchronic low back peptic ulcer disease by EGD\ndiverticulosis\n\n\nDi']

['me 0000\*\*] is a 00 year old woman with h/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*] syndrome,\nshort gut syndrome, prior line in']
['me 0000\*\*] is a 00 year old woman with h/syndrome,\nshort gut syndrome, prior line in']

['cirrhosis s/p liver-kidney transplant, h/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 00000\*\*]\n  osteomyelitis, now w/ CNS/strep viridian bacte']
['cirrhosis s/p liver-kidney transplant, h/  osteomyelitis, now w/ CNS/strep viridian bacte']

['cal atrophy,\nneurogenic bladder plus Shy-[\*\*Last Name (un) 00000\*\*] syndrome who was visiting\n[\*\*State 000\*\*] w']
['cal atrophy,\nneurogenic bladder plus Shy-syndrome who was visiting\n[\*\*State 000\*\*] w'] 

['edure.\n REASON FOR THIS EXAMINATION:\n  R/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) 000\*\*] abscess. Need drainage if abscess found.\n ']
['edure.\n REASON FOR THIS EXAMINATION:\n  R/abscess. Need drainage if abscess found.\n ']

['gen elevated. Vitamin K was given.\n.\n# S/[\*\*Initials (NamePattern0) \*\*] [\*\*Last Name (NamePattern0) \*\*] resection. Pain regimen. Surgery was consult']
['gen elevated. Vitamin K was given.\n.\n# S/resection. Pain regimen. Surgery was consult']

['.D. [\*\*MD Number(0) 0000\*\*]\n\nDictated By:[\*\*Last Name (NamePattern0) 00000\*\*]\nEGD report([\*\*0000-0-00\*\*]):\nFindings:']
['.D. [\*\*MD Number(0) 0000\*\*]\n\nDictated By:EGD report([\*\*0000-0-00\*\*]):\nFindings:']

['ry Failure\nPulmonary Edema\nRenal Failure\n[\*\*Last Name (un) 00000\*\*] Syndrome\nAtrial Fibrillation\n\n","Admission ']
['ry Failure\nPulmonary Edema\nRenal Failure\nSyndrome\nAtrial Fibrillation\n\n","Admission ']
davidchan-emtelligent commented 6 years ago

davidchan-emtelligent commented 6 years ago

davidchan-emtelligent commented 6 years ago