angeloskath / php-nlp-tools

Natural Language Processing Tools in PHP
Do What The F*ck You Want To Public License
743 stars 152 forks source link

Lancaster stemmer with empty string #34

Closed hwsamuel closed 9 years ago

hwsamuel commented 9 years ago

The Lancaster stemmer doesn't gracefully exit if an empty string is supplied. To reproduce, use the following code.

$ls = new LancasterStemmer();
$ls->stem('');

The following error is shown.

Notice: Uninitialized string offset: -1 in /php-nlp-tools/src/NlpTools/Stemmers/LancasterStemmer.php on line 86

I've tried the following modification to the stem function's code by adding the condition if (strlen($word) == 0) return;. However, after doing that a new error shows up from another file that I'm not sure how to fix.

Notice: Uninitialized string offset: -1 in /php-nlp-tools/src/NlpTools/Utils/EnglishVowels.php on line 20

angeloskath commented 9 years ago

Hi, thanks for the find!

I have added the following simple fix

if (empty($word))
    return $word;

I have also added a test just for this particular bug. Everything seems to work fine. If there is something that I am missing could you point it out? Do you want me to push a temporary branch so you can test it?

hwsamuel commented 9 years ago

Thanks, this fix works, I've tested it out in my code as well