cadmiumcr / cadmium

Natural Language Processing (NLP) library for Crystal
https://cadmiumcr.com
MIT License
205 stars 15 forks source link

Add a POS tagger #6

Open watzon opened 5 years ago

watzon commented 5 years ago

POS tagging is the categorizing of words in a sentence based on part of speech relative to the other words in the sentence. This can be done very simply with wordnet, but to accomplish full POS tagging each word must be tagged based in its relationship to the previous word and the next word. Some "words" are also made up of multiple grams, such as "New York".

The tagger should be able to see a word like "New" and know that if "York", "Jersey", "Amsterdam", or any number of other words appear next to it that there is a very good possibility that they should be counted as one word, and a proper noun at that.