Kowsher / Bangla-NLP

13 stars 6 forks source link

Error lemmatizing #20

Closed nayanbanik closed 4 years ago

nayanbanik commented 4 years ago

Today I was exploring your lemmatizer on some random raw texts. Some nouns returned wrong lemma which is somewhat acceptable like "শাকিব" becomes "শাক". But the error happens with the word " ববি". It throws error. Check it out as it is proper noun and use as a name.

avishek-018 commented 4 years ago

Thank you for the report. We shall look into it.

On Wed, Aug 26, 2020 at 7:41 PM Nayan Banik notifications@github.com wrote:

Today I was exploring your lemmatizer on some random raw texts. Some nouns returned wrong lemma which is somewhat acceptable like "শাকিব" becomes "শাক". But the error happens with the word " ববি". It throws error. Check it out as it is proper noun and use as a name.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kowsher/Bangla-NLP/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU4W6GBPSKGXG7RLHCF6DLSCUGKBANCNFSM4QL25LMA .

-- Avishek Das Undergraduate 4th Year Student, Dept. of CSE, CUET. Branch Chair, IEEE Student Branch, CUET Mobile: +8801799730137 Email: avishek https://mail.google.com/mail/u/1/goog_1166236084. das.ayan@gmail.com *Linkedin ID*: https://www.linkedin.com/in/avishek-das-11072454/

nayanbanik commented 4 years ago

Well, the problem was relatively trivial. Inside your trie.py file, you search the dictionary _charmap directly. Fix it with _charmap.get() method. Since this is the only error till now, I am closing this issue.

Kowsher commented 4 years ago

Basically, we worked on Bangla word not with names. Besides, we have a shortage of Bangla root word, if we include humans Bangla name as well as increase Bangla word it works perfectly. Since the algorithm is trained based on the label