Closed yooper closed 11 years ago
Hello,
I have no issues with you questioning my design decisions and implementation. Let me try to answer your questions:
Thanks, --Dan
On Tue, Aug 20, 2013 at 1:23 PM, Angelos Katharopoulos < notifications@github.com> wrote:
Again, I appreciate your work but I will unfortunately have to ask you to wait a bit for me. I am in the process of moving most tests to phpunit and deciding which of my current tests are actually functional instead of unit tests and how I am going to handle them. After that, I will also reformat the code with php coding standards fixerhttps://github.com/fabpot/PHP-CS-Fixerand then I will create a develop branch.
Now regarding the pull request.
- Why keep the rules in a different json file instead of keeping them in code like most Lancaster Stemmer implementations?
- What is the point of the IDataReaderInterface? NlpTools has Document and TrainingSet to abstract away the actual data (training data, test data). PHP has Iterator interface. I would suggest if you want to add a json reader simply to add a JsonDocument. Although if you look at the first point, what I am really suggesting is to not have a file dependency at all.
- NlpTools\Utils\Vowel should really be NlpTools\Utils\EnglishVowels. I think PorterStemmer uses the same concept of vowels so, although otherwise I would think that such a small abstraction is an overkill, I believe it could also be used elsewhere in the system.
- This one is something you wouldn't possibly know (not pushed yet), I am adding the tests in the same namespace as the class that is being tested, for instance PennTreeBankTokenizerTest is now in NlpTools\Tokenizers namespace.
So to sum up, I will push the develop branch in due time. Branch off of there, apply the above changes (or argue with me :-) ) and then I 'll merge that one.
— Reply to this email directly or view it on GitHubhttps://github.com/angeloskath/php-nlp-tools/pull/5#issuecomment-22961868 .
Hello,
if (!(is_array($rules) || $rules instanceof \Traversable)) { throw new \InvalidArgumentException('...'); }
also the default rules could be a private static variable used when $rules
is null.$stemmer = new LancasterStemmer(array(....rules here....));
for testing and flexibility.Just as a general direction for the library. I have no intention of having resources of any kind in the repository (except for testing). No prebuilt models, datasets, etc. That is an additional issue that I have with the external file ruleset.
Angelos
Per your requests I updated my pull request with your suggestions.
Thanks, I am sorry that I will have you doing some more changes.
I have done quite a bad job keeping my git history clean and I would like to improve on that. I suggest keeping the develop branch synced with origin. If you want to make an addition make a feature branch and make a pull request from there. I will also be doing that locally so even though I will not be making pull requests to myself I will have commits merged with --no-ff from local private feature branches.
So regarding this pull request I suggest we close it. Then to your local repo:
I am closing this pull request and I will resubmit the PR against the develop branch
Again, I appreciate your work but I will unfortunately have to ask you to wait a bit for me. I am in the process of moving most tests to phpunit and deciding which of my current tests are actually functional instead of unit tests and how I am going to handle them. After that, I will also reformat the code with php coding standards fixer and then I will create a develop branch.
Now regarding the pull request.
So to sum up, I will push the develop branch in due time. Branch off of there, apply the above changes (or argue with me :-) ) and then I 'll merge that one.