aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
https://aphp.github.io/edsnlp/
BSD 3-Clause "New" or "Revised" License
112 stars 29 forks source link

eds.negation regex matches on "une" preceding entities which include another negation subtoken #256

Closed cvinot closed 6 months ago

cvinot commented 7 months ago

Description

The negation "preceding_regex" is currently r"ne(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))" which matched on patterns such as: "Situation compliquée d’une neutropénie fébrile aggravée." "Le patient est traité d'une cure d'ALECTINIB depuis le ..." because of the "ne" and "NI" in entities.

I fixed this thanks to your customizable config but figured i'd give a heads up.

line 104 in patterns.py

preceding_regex = [
    # ne (up to 3 words separated by spaces or newlines) pas/point/...
    r"\bne\b(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))"
]

How to reproduce the bug

add in test_negation.py line 32:

"Situation aggravée par une <ent negated=false>neutropénie fébrile</ent>."
"Le patient est traité d'une cure d'<ent negated=false>ALECTINIB</ent> depuis le ..."

run your pytest