Closed OlivierHassanaly closed 1 week ago
Hi @OlivierHassanaly !
The doc.text
always contain the original text of the document, to use the results of the eds.normalizer
pipeline, you should use edsnlp.utils.doc_to_text.get_text
as shown here http://aphp.github.io/edsnlp/latest/pipes/core/normalizer/#usage
import edsnlp
from edsnlp.utils.doc_to_text import get_text
config = dict(
lowercase=True,
accents=True,
quotes=False,
spaces=False,
pollution=True,
)
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)
text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
doc = nlp(text)
print(get_text(doc, attr='TEXT', ignore_excluded=True))
# Out: Pneumopathie à `coronavirus'
It seems that the eds.normalizer pipe does not act
i am using edsnlp version 0.13.1
How to reproduce the bug
config = dict( lowercase=True, accents=True, quotes=False, spaces=False, pollution=True, )
nlp = edsnlp.blank("eds") nlp.add_pipe("eds.normalizer", config=config)
text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
doc = nlp(text)
print(doc.text)
I get unchanged text as a result