AndyIbanez / andyibanez-com

Static website.
1 stars 0 forks source link

posts/tokenizing-nltokenizer/ #21

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Tokenizing Natural Language into Semantic Units in iOS • Andy Ibanez

https://www.andyibanez.com/posts/tokenizing-nltokenizer/

alamodey commented 3 years ago

Thanks for this article. I just tested the code and it does accurate tokenize a string of Japanese into different words. But I am trying to do something more advanced where I tag each word as a noun, verb, etc.

I am just using the basic code example here: https://developer.apple.com/documentation/naturallanguage/identifying_parts_of_speech

But for some reason, it seems to work with English but when I input Japanese it just tags every word as OtherWord. Have you tried using the tagger and had much luck with it? Thanks.

AndyIbanez commented 3 years ago

@alamodey I actually have an article on NLTagger here.

Japanese is a highly contextual language, so my immediate guess is that you are handing it simple words. I don't remember much about this API, but I think you can hand it "bigger components" such as sentence to get more accurate results.

alamodey commented 3 years ago

I think it's as you have demonstrated in the article that there is support for lexical class in English, but not in Japanese.