SangitaNLP / sangita

A Natural Language Toolkit for Indian Languages
Apache License 2.0
40 stars 41 forks source link

added an improved version of stemming algorithm #4

Closed imVivekGupta closed 6 years ago

imVivekGupta commented 7 years ago

3 splitting by the 2nd word of the corpora list doesn't appear to be very effective. Consider this:

'अंगूठे\tअंगूठा' Going by the implemented algorithm, we get 'अंगूठे' as the inflection. Instead, I propose, finding the common prefix and using the uncommon suffix part as the inflection. I have implemented this idea.

djokester commented 7 years ago

The indentation is all wrong! Please do correct it. This is a good commit otherwise!

imVivekGupta commented 7 years ago

Corrected! :)

djokester commented 7 years ago

Errr... It's still wrong. Check return words indentation

imVivekGupta commented 7 years ago

Please check now.

djokester commented 7 years ago

Return words should be just one indent away from the function declaration. Are you testing the file before sending the PR?

imVivekGupta commented 7 years ago

"words" would be defined only when control enters the if block, wouldn't it? It should be 2 indents away from function declaration this way. Yes, I have tested this file. It works.

djokester commented 7 years ago

Define words outside the if block. Add an error case in an else statement.

imVivekGupta commented 7 years ago

Added error case for wrong input. Please review.

djokester commented 6 years ago

Hey @imVivekGupta I am sorry I was busy for a few days! I think we can work with this. Give me a few hours to review this! Thank You!