Add custom FeatureUnion

clips / wordkit

Featurize words into orthographic and phonological vectors.

GNU General Public License v3.0

40 stars 10 forks source link

We currently use the sklearn.pipeline.FeatureUnion to combine different featurizers and corpora. This works great! But we want to replace it to add:

Merging different sources (e.g. merging the frequency table for a word from one data source with perceptual characteristics for the same word from another corpus).
Adding weights to transformers (e.g. assigning a weight of .5 to a phonology transformer to reduce the weight of phonology in any distance calculations).

The first point definitely makes a lot of sense and will be added ASAP, but I'm not sure about the second one.

clips / wordkit

Add custom FeatureUnion #2