edponce / FACET

Framework for Annotation and Concept Extraction in Text
Other
2 stars 0 forks source link

Support user-defined preprocessing converters #8

Open edponce opened 4 years ago

edponce commented 4 years ago

Provide an option to specify a module or directory, where FACET dynamically imports text preprocessing methods to apply during installation.

edponce commented 4 years ago

A key idea is that the installed terms can be preprocessed further than simple normalization and lowercasing. For example, removing stop words and lemmatizing, but if the terms are changed, then this implies that a copy of the original terms needs to be saved for output purposes.

edponce commented 4 years ago

Consider supporting custom modules to be included in configuration files. For example, 'module' : ['mod1.py', 'mod2.py'], then user can use functions as options via 'mod1:func1' syntax. Maybe even support passing parameter values with 'key=value' syntax.

Also we can include default module functions, for example, for converters; 'unidecode', 'lower', 'upper'.