alan-turing-institute / defoe

Code to analyse books and newspapers data using Apache Spark.
MIT License
17 stars 3 forks source link

Support configuration of word preprocessing type #22

Closed mikej888 closed 5 years ago

mikej888 commented 5 years ago

The following queries use PreprocessWordType.LEMMATIZE by default:

To change to another preprocessing type requires editing the source code. Update the query configuration file to:

Extend support for preprocessing to all other queries across all data models.

mikej888 commented 5 years ago

Addressed in 9b7daea52823380a2934f2485d4fee022d7da67d, using a YAML file that includes the preprocessing type and the path (relative to the current file or absolute) to a plaintext list of words/sentences.