angeloskath / php-nlp-tools

Natural Language Processing Tools in PHP
Do What The F*ck You Want To Public License
749 stars 153 forks source link

Implemented Stop words Abstract Factory, included EnglishStopWords Class #13

Closed yooper closed 11 years ago

yooper commented 11 years ago

Any thoughts on this pull request?

angeloskath commented 11 years ago

I would not implement this functionality that way. I would probably think of a generic interface like TokenTransformInterface or TransformInterface which would take a set of tokens, change them and return them.

In any way shouldn't it at least extend a generic filter interface or implementation? I think of it like FilterIterator of SPL. Do you think we need an EnglishStopWords, GreekStopWords, etc? Shouldn't we simply accept an array of words and filter them?

By the way if you see how I described the TransformInterface above you will notice the similarities between that description and the FeatureFactories. That is why I am not sure if those should be implemented separately or not.

yooper commented 11 years ago

I think you are right. What would work better is implementing a strategy design pattern that can provide an interface for all Classes that modify or filter a token. This interface will be called TransformStrategyInterface and can be a generic interface for use throughout the library. TransformStrategyInterface will have one interface call, transform. All classes that implement this interface, will either return the token:

The null option will indicate the token can be removed. We can then design a way to add multiple transform classes to a training set or training document or some other method after tokenization is complete. I am in favor of adding the filter or modify instances to the TrainingSet. This would simplify the API by making it is easy to add multiple transformation rules/filters ie text normalization to lower case, stemmers, lemmatisation and stop word filters in advance of analyzing the data or building the feature factories.

If you agree to this approach, write any comments back to me and close this pull request.

Thanks,

On Wed, Sep 18, 2013 at 5:11 AM, Angelos Katharopoulos < notifications@github.com> wrote:

I would not implement this functionality that way. I would probably think of a generic interface like TokenTransformInterface or TransformInterface which would take a set of tokens, change them and return them.

In any way shouldn't it at least extend a generic filter interface or implementation? I think of it like FilterIterator of SPL. Do you think we need an EnglishStopWords, GreekStopWords, etc? Shouldn't we simply accept an array of words and filter them?

By the way if you see how I described the TransformInterface above you will notice the similarities between that description and the FeatureFactories. That is why I am not sure if those should be implemented separately or not.

— Reply to this email directly or view it on GitHubhttps://github.com/angeloskath/php-nlp-tools/pull/13#issuecomment-24650781 .