PersDigUMD / MorphologyServiceAPI

2 stars 1 forks source link

POS tags are not correct #2

Open maryam-foradi opened 8 years ago

maryam-foradi commented 8 years ago

For some words it doesn't refer to anything: http://services.perseids.org/pysvc/morphologyservice/analysis/word?word=%D9%84%D8%B7%D9%81&lang=per&engine=hazm For some gives wrong POS: http://services.perseids.org/pysvc/morphologyservice/analysis/word?word=%D8%A8%DA%AF%D9%88&lang=per&engine=hazm It refers to noun, although بگو is a verb.

balmas commented 8 years ago

I'm not sure if this is a problem with the way we are using hazm or something else.

the service calls tagger.tag(بگو) and gets back: [('ب', 'N'), ('گ', 'N'), ('و', 'CONJ')]

@elijahjcooke any thoughts?

elijahjcooke commented 8 years ago

So the problem is for some reason Hazm is not tokenizing the text correctly. Hazm should break the text into sentences and then break it into words but for some reason is breaking the individual characters apart instead of the words. Maryam does it happen when you send texts with multiple sentences or does it only happen when you send single sentences or single words? This will help in trying to find how to fix the tokenizing bug we are getting.

balmas commented 8 years ago

Arethusa currently only sends single words to the parser, not entire sentences.

maryam-foradi commented 8 years ago

I haven't tried it with multiple sentences, as it makes the treebanking complicated, if not impossible.

On Wed, Feb 10, 2016 at 2:22 PM, Bridget Almas notifications@github.com wrote:

Arethusa currently only sends single words to the parser, not entire sentences.

— Reply to this email directly or view it on GitHub https://github.com/PersDigUMD/MorphologyServiceAPI/issues/2#issuecomment-182370588 .

elijahjcooke commented 8 years ago

Ok then I might know a fix to the problem, @balmas Will Arethusa be automatically updated if change the code on github?

elijahjcooke commented 8 years ago

@balmas

balmas commented 8 years ago

Thanks! I'll deploy tomorrow!

balmas commented 8 years ago

ah, sorry misunderstood the question here .. the morphology service api will not be automatically updated but I'm happy to deploy for testing when you're ready.