bilgehicyilmam / Dolphin

2 stars 2 forks source link

More accurate auto annotations with regular expression #64

Closed HasanGokce closed 3 years ago

HasanGokce commented 3 years ago

Issue type: task Parent issue: #50

Problem

For example, we want to annotate word "organ". Default find() method cannot be used for this sentence. Because we don't want to create an annotation for word organ inside of word "organization".

Solution

This regular expression might be used a solution, and it can be improved for future cases:

pattern = rf'\b(?!>){label}\b(?!<)'

    for match in re.finditer(pattern, text, re.IGNORECASE):
        # number of character to cut from left of the match
        num_character = 20

        # match start and end position
        s = match.start()
        e = match.end()
        print ('String match "%s" at %d:%d' % (text[s:e], s, e))


## Source
* https://stackoverflow.com/questions/60925649/regex-look-ahead-behind-with-word-boundary
* https://regex101.com/r/vY3pY3/1
* https://stackoverflow.com/questions/21063742/greater-than-and-less-than-symbol-in-regular-expressions
* https://www.tutorialspoint.com/How-do-we-use-re-finditer-method-in-Python-regular-expression
HasanGokce commented 3 years ago
find_keyword executed for assay
find_keyword executed for specimen
find_keyword executed for gene
The BioFire® <div id="covid19-b3f848d7-6628-44a8-b350-2ae33aace825">COVID-19</div> Test and Respiratory
 Panel 2.1 (RP2.1) are rapid, fully automated <div id="covid19-8d823cd6-a42b-437c-bd58-c5324deec56f">assay</div>s
 for the detection of severe acute respiratory <div id="covid19-a4dfee59-0a84-4a4e-81a4-907f5196af14">syndrome</div>
 coronavirus 2 (SARS-CoV-2) in nasopharyngeal swabs. In the case of the RP2.1, an additional 21 viral
and bacterial pathogens can be detected. Both tests have received emergency use authorization from the
 U.S. Food & Drug Administration and Interim Order authorization from Health Canada
for use in clinical laboratories. We evaluated the performance characteristics of these tests in comparison to a
 laboratory-developed real-time PCR <div id="covid19-3611eb46-47d3-4515-88cd-0a5930fec004">assay</div> targeting
 the 
viral RNA-dependent RNA polymerase and E <div id="covid19-6bb3c9b4-500d-4ad7-b422-6cd03eb64851">gene</div>s. 
A total of 78 tests were performed using the BioFire <div id="covid19-22c2135f-3dbe-44d0-8b57-901e159bd6c3">COVID-19</div> 
Test, including 30 clinical <div id="covid19-272cc4f5-9bc5-4c87-944c-00ceee955a37">specimen</div>s and 48 tests in a 
limit of detection study; 57 tests were performed using the RP2.1 for evaluation of SARS-CoV-2 detection, including 30
 clinical <div id="covid19-66ec475b-a486-4b29-ab7e-e597e45f7e03">specimen</div>s and 27 tests for limit of detection. 
Results showed 100% concordance between the BioFire <div id="covid19-e47ae125-7edc-44df-9bef-59a906d4bb23">assay</div>s 
and the laboratory-developed test for all clinical samples tested, and acceptable 
performance of both BioFire assays at their stated limits of detection. Conclusively, the BioFire 
<div id="covid19-f4e7adf8-49f4-4130-b362-adc90c68e18d"> COVID-19</div> Test and RP2.1 are highly sensitive assays
 that can be effectively used in the clinical laboratory
 for rapid SARS-CoV-2 testing.
HasanGokce commented 3 years ago
HasanGokce commented 3 years ago

String: Effect of COVID-19 on organ. Health organization

Plural form of the word can be found with this regular expression:

HasanGokce commented 3 years ago