Open edponce opened 4 years ago
A key idea is that the installed terms can be preprocessed further than simple normalization and lowercasing. For example, removing stop words and lemmatizing, but if the terms are changed, then this implies that a copy of the original terms needs to be saved for output purposes.
Consider supporting custom modules to be included in configuration files. For example, 'module' : ['mod1.py', 'mod2.py'], then user can use functions as options via 'mod1:func1' syntax. Maybe even support passing parameter values with 'key=value' syntax.
Also we can include default module functions, for example, for converters; 'unidecode', 'lower', 'upper'.
Provide an option to specify a module or directory, where FACET dynamically imports text preprocessing methods to apply during installation.