Closed hammurabi-ds closed 5 years ago
Hi @Hamurabbi, we have not looked at supporting NLP preprocessing in scikit-learn. Can you maybe ellaborate on how you would like to use danish NLP models in scikit-learn?
Hi. here is an example of how such a pipeline may look like (i have a similar package that i use which is not open source, but this is how it can look like). It should be fairly simple to built the wrappers and have them compatible with scikit learn pipeline.
prep = Preprocessor('english')
pip = Pipeline([
('word_token', WordTokenizer(prep)),
('punct', PunctuationRemover(prep)),
('pos', POSTagger(prep)),
('lemma', Lemmatizer(prep)),
('stopword', StopwordRemover(prep)),
])
results = pip.fit_transform(RAW_TEXT)
This pipeline object may now be saved and reused.
Well thank you for the suggestion and clarification. We will look into it :)
Can i sequentially apply your nlp preprocessors in the scikit learn Pipeline? If not then i think its a an advantage for the package