chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

How does one pass keyword arguments to preprocessing pipelines? #337

Closed gryBox closed 2 years ago

gryBox commented 3 years ago

Hi @bdewilde - Congratulations and a big thanks on the awesome 0.11.0 release!

what's wrong?

It would be nice to have an example demonstrating how to pass keyword arguments to various functions in a preprocessing pipeline. For example the preprocessing.normalize.repeating_chars function requires the char argument.

preproc = preprocessing.make_pipeline(
        preprocessing.remove.html_tags,
        preprocessing.normalize.repeating_chars,
        preprocessing.normalize.quotation_marks,
        preprocessing.remove.accents
    )

relevant page or section

pipeline doc and example

If you have time can you post a small example to this thread? Thanks in advance

bdewilde commented 3 years ago

Hi @gryBox , I could've sworn I had an example showing this, but looks like I cut it out for simplicity's sake. The easiest way to pass args into individual preprocessors is to use functools.partial. For example:

>>> from functools import partial
>>> from textacy import preprocessing
>>> preproc = preprocessing.make_pipeline(
...     preprocessing.remove.html_tags,
...     partial(preprocessing.normalize.repeating_chars, chars=".", maxn=3),
...     preprocessing.normalize.quotation_marks,
...     partial(preprocessing.remove.accents, fast=True),
...  )

I'll add a suitable example to the docs.