aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
https://aphp.github.io/edsnlp/
BSD 3-Clause "New" or "Revised" License
111 stars 29 forks source link

Feature request: better API for adding pipes to a pipeline #279

Closed percevalw closed 3 months ago

percevalw commented 4 months ago

Feature type

Adding a pipe to a pipeline has quite a few limitations at the moment:

import edsnlp

nlp = edsnlp.blank('eds')
nlp.add_pipe('eds.matcher', config={"terms": {"key": ["expr 1", "expr 2"]}})
...

We can deviate from spacy iconic API and think of something better along these lines:

import edsnlp
import edsnlp.pipes as eds

nlp = edsnlp.blank('eds')
nlp.add_pipe(eds.matcher(terms={"key": ["expr 1", "expr 2"]}))

The problem is, some pipes (like eds.matcher) requires an nlp object at init time which is given by add_pipe. We could ask the user to provide the nlp argument nlp.add_pipe(eds.matcher(nlp=nlp, terms={"key": ["expr 1", "expr 2"]})) but this feels redundant.

Another option is to have promise = eds.matcher(terms={"key": ["expr 1", "expr 2"]}) return a "promise"/"curried" component if a required nlp attribute is missing, which is actually instantiated when it is added to the pipeline (via promise.instantiate(nlp=self)). This feels like an anti-pattern, and therefore should be extensively documented, and produce warnings whenever a user tries to use a non-initialized pipe outside a pipeline.

percevalw commented 3 months ago

Solved in #279