biosemantics / micropie2

4 stars 5 forks source link

antibiotics #43

Closed carrineblank closed 8 years ago

carrineblank commented 8 years ago

Hi Jin,

For antibiotic sensitivity/resistance:

Could the output compounds be alphabetized? This would greatly help for coding the matrix!

"gentamicin#following antibiotics : amikacin#kanamycin#polymyxin B 0 IU#tetracycline#following antibiotics" - the words "following antibiotics" are unnecessary.

MIcroPIE seems to have difficulty in fully extracting results from long lists of compounds. I have pasted in a few examples below. I'm not sure what the consistent trend may be but it is possible that compound words like "Polymyxin B" or "Penicillin G" may be confusing the algorithm.

For the description sentence "Sensitive to neomycin, tetracycline, polymyxin B and gentamicin, but not to ampicillin, streptomycin or kanamycin." MicroPIE is correctly picking up the sensitive ones, but outputs "ampicillin" in the antibiotic resistance column. The output should list "ampicillin#streptomycin#kanamycin".

For the description sentence "Sensitive to gentamicin, carbenicillin, lincomycin, neomycin, oleandomycin, polymixin B, streptomycin, tetracycline, chloramphenicol, doxycycline and erythromycin." MicroPIE outputs "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B", when the correct output should be "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B# streptomycin#tetracycline#chloramphenicol#doxycycline#erythromycin"

For the description sentence "Susceptible to erythromycin, ampicillin, cephaloridin and lincomycin. Not susceptible to kanamycin, benzylpenicillin, oxacillin, neomycin, streptomycin, gentamicin and polymyxin B. " - MicroPIE incorrectly outputs all compounds to the antibiotic sensitive column. The correct output should be "erythromycin#ampicillin#cephaloridin#lincomycin" to antibiotic sensitivity and "kanamycin#benzylpenicillin#oxacillin#neomycin#streptomycin#gentamicin#polymyxin B" to antibiotic resistance.

For the description sentence "Susceptible to (µg per disc unless indicated) co-trimoxazole#lincomycin#nalidixic acid#ampicillin#bacitracin (10), carbenicillin (100), cefotaxime (30), chloramphenicol (30), ciprofloxacin (5), erythromycin (15), gentamicin (30), lomefloxacin (30), nitrofurantoin (300), norfloxacin (10), novobiocin (30), oleandomycin (15), penicillin G (10), rifampicin (30), spectinomycin (100), tetracycline (30), doxycycline (10), cefuroxime (30), cefoperazone (75), roxithromycin (30), streptomycin (10) and vancomycin (30)...." MicroPIE is only outputting "lincomycin#co-trimoxazole#nalidixic acid#ampicillin#bacitracin#carbenicillin". It should be outputting "co-trimoxazole#lincomycin#nalidixic acid#ampicillin#bacitracin#carbenicillin#cefotaxime#chloramphenicol#ciprofloxacin#erythromycin#gentamicin#lomefloxacin#nitrofurantoin#norfloxacin#novobiocin#oleandomycin#penicillin G#rifampicin#spectinomycin#tetracycline#doxycycline#cefuroxime#cefoperazone#roxithromycin#streptomycin#vancomycin"

For the description sentence "Susceptible to (µg per disc unless indicated) cefotaxime (30), chloramphenicol (30), ciprofloxacin (5), cefuroxime (30) and cefoperazone (75) and resistant to amikacin (30), ampicillin (10), cefazolin (30), colistin (10), co-trimoxazole (25), erythromycin (15), penicillin G (10), polymyxin B (50 U), kanamycin (30), lomefloxacin (30), nalidixic acid (30), nitrofurantoin (300), streptomycin (10), tetracycline (30), tobramycin (10) and vancomycin (30). " MicroPIE is correctly outputting the list for antibiotic sensitivity. However, for antibiotic resistance it incorrectly outputs "amikacin#ampicillin#cefazolin#colistin#co-trimoxazole". The correct output should be "amikacin#ampicillin#cefazolin#colistin#co-trimoxazole#erythromycin#penicillin G#polymyxin B#kanamycin#lomefloxacin#nalidixic acid#nitrofurantoin#streptomycin#tetracycline#tobramycin#vancomycin"

For the description sentence "Susceptible to the following antibiotics: tetracycline, streptomycin, and chloramphenicol. Not susceptible to neomycin, kanamycin, penicillin G, or erythromycin. " MIcroPIE incorrectly lists the output for antibiotic sensitivity as "following#streptomycin#following antibiotics: tetracycline#chloramphenicol#neomycin#kanamycin#penicillin G#erythromycin". The correct output should be "tetracycline#streptomycin#chloramphenicol". Similarly, MicroPIE incorrectly lists the output for antibiotic resistance as "0.01". The correct output should be "neomycin#kanamycin#penicillin G#erythromycin"

For the description sentence "Susceptible to penicillin G, chloramphenicol, cephalothin, lincomycin and oleandomycin, but not to polymyxin B, gentamicin, novobiocin, kanamycin or neomycin. " MicroPIE correctly extracts the output for antibiotic sensitivity, but incorrectly outputs "polymyxin B" for antibiotic resistance. The correct output should be "polymyxin B#gentamicin#novobiocin#kanamycin#neomycin".

danveno commented 8 years ago

The values have been alphabetized in the output matrix.

danveno commented 8 years ago

For the description sentence "Sensitive to neomycin, tetracycline, polymyxin B and gentamicin, but not to ampicillin, streptomycin or kanamycin." MicroPIE is correctly picking up the sensitive ones, but outputs "ampicillin" in the antibiotic resistance column. The output should list "ampicillin#streptomycin#kanamycin".
This error is caused by a bug in parseValueFromTree method id the AntibioticSyntacticExtractor class. Fixed.

danveno commented 8 years ago

For the description sentence "Sensitive to gentamicin, carbenicillin, lincomycin, neomycin, oleandomycin, polymixin B, streptomycin, tetracycline, chloramphenicol, doxycycline and erythromycin." MicroPIE outputs "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B", when the correct output should be "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B# streptomycin#tetracycline#chloramphenicol#doxycycline#erythromycin".

This is an error in sentence splitting probably caused by Standford CoreNLP tool.

danveno commented 8 years ago

For the description sentence "Susceptible to erythromycin, ampicillin, cephaloridin and lincomycin. Not susceptible to kanamycin, benzylpenicillin, oxacillin, neomycin, streptomycin, gentamicin and polymyxin B. " - MicroPIE incorrectly outputs all compounds to the antibiotic sensitive column. The correct output should be "erythromycin#ampicillin#cephaloridin#lincomycin" to antibiotic sensitivity and "kanamycin#benzylpenicillin#oxacillin#neomycin#streptomycin#gentamicin#polymyxin B" to antibiotic resistance.

A possible bug about the negation expression identification was fixed.

lmoore207 commented 8 years ago

Hi Jin, Is this something that can be fixed? Thanks for working on these! Lisa

On Mon, Jun 27, 2016 at 8:29 PM, Jin notifications@github.com wrote:

For the description sentence "Sensitive to gentamicin, carbenicillin, lincomycin, neomycin, oleandomycin, polymixin B, streptomycin, tetracycline, chloramphenicol, doxycycline and erythromycin." MicroPIE outputs "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B", when the correct output should be "lincomycin#carbenicillin#gentamicin#neomycin#oleandomycin#polymixin B# streptomycin#tetracycline#chloramphenicol#doxycycline#erythromycin".

This is an error in sentence splitting probably caused by Standford CoreNLP tool.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biosemantics/micropie2/issues/43#issuecomment-228915594, or mute the thread https://github.com/notifications/unsubscribe/ASsRZG0oiBTCRTrU29z9Oivj1Dlx1duJks5qQGrtgaJpZM4Io0cy .[image: Web Bug from https://github.com/notifications/beacon/ASsRZKXQi_nXKh3Bk-H3DqqfjkURG1pJks5qQGrtgaJpZM4Io0cy.gif]

Dr. Lisa Moore Professor Department of Biological Sciences University of Southern Maine Portland, ME 04103 office: Science bldg, room 476B, C wing office phone: 207-780-4261 email contact: lrmoore@maine.edu

danveno commented 8 years ago

If the errors are caused by bugs in our algorithm, they will be fixed. Otherwise (caused by other tools or due to the limitation in the algorithm design), the work will be too difficult. I suggest to work on them in future.

lmoore207 commented 8 years ago

this is fine. Is there a list that you have kept someplace (perhaps it is simply the GitHub issues tracker) indicating specific problems that will need to be worked on in the future? Thanks, Lisa

On Mon, Jun 27, 2016 at 9:30 PM, Jin notifications@github.com wrote:

If the errors are caused by bugs in our algorithm, they will be fixed. Otherwise (caused by other tools or due to the limitation in the algorithm design), the work will be too difficult. I suggest to work on them in future.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biosemantics/micropie2/issues/43#issuecomment-228924050, or mute the thread https://github.com/notifications/unsubscribe/ASsRZG6wAAW7fyuhEffDxUbw6zVZH9yXks5qQHkqgaJpZM4Io0cy .[image: Web Bug from https://github.com/notifications/beacon/ASsRZBY0MbwhHaW53hqZs-IhOchVwAgwks5qQHkqgaJpZM4Io0cy.gif]

Dr. Lisa Moore Professor Department of Biological Sciences University of Southern Maine Portland, ME 04103 office: Science bldg, room 476B, C wing office phone: 207-780-4261 email contact: lrmoore@maine.edu

danveno commented 8 years ago

For the description sentence "Susceptible to the following antibiotics: tetracycline, streptomycin, and chloramphenicol. Not susceptible to neomycin, kanamycin, penicillin G, or erythromycin. " MIcroPIE incorrectly lists the output for antibiotic sensitivity as "following#streptomycin#following antibiotics: tetracycline#chloramphenicol#neomycin#kanamycin#penicillin G#erythromycin". The correct output should be "tetracycline#streptomycin#chloramphenicol". Similarly, MicroPIE incorrectly lists the output for antibiotic resistance as "0.01". The correct output should be "neomycin#kanamycin#penicillin G#erythromycin"

This sentence "Susceptible to the following antibiotics: tetracycline, streptomycin, and chloramphenicol. " is a bit complex for computer to understand. I tried my best to optimize the phrase lists identification.