The feature extraction functions are currently all just functions. Making them objects would cause some serious reduction in overhead, and some more options for expansion in the future.
Currently, users have to do something like:
all_phonemes = get_characters(data, field='phonology')
features = extract_one_hot_phonemes(all_phonemes)
o = ONCTransformer(features)
X = o.fit_transform(data)
This isn't too bad, but can be simplified by making the extraction process above atomic:
But this would require adding the same couple of lines of code to all extraction functions.
So, what I propose is:
o = ONCTransformer(OneHotPhonemeExtractor(), field='phonology')
X = o.fit_transform(data)
This merges the process of extracting the relevant phonemes from the data, and allows us to use inheritance for e.g. type checking, chaining etc.
I think it also puts less of a burden on the user, who no longer has to separately keep track of the features.
The feature extraction functions are currently all just functions. Making them objects would cause some serious reduction in overhead, and some more options for expansion in the future.
Currently, users have to do something like:
This isn't too bad, but can be simplified by making the extraction process above atomic:
But this would require adding the same couple of lines of code to all extraction functions.
So, what I propose is:
This merges the process of extracting the relevant phonemes from the data, and allows us to use inheritance for e.g. type checking, chaining etc. I think it also puts less of a burden on the user, who no longer has to separately keep track of the features.