buriy / spacy-ru

Russian language models for spaCy
MIT License
242 stars 29 forks source link

Detection of noun chunks #12

Open miloskovacevic68 opened 4 years ago

miloskovacevic68 commented 4 years ago

Hello, I plan to use spacy-ru (its pos tagger) in my research to detect Russian noun chunks. Since I am from Serbia, I don't know what could be the proper sequences of pos tags that represent noun chunks in Russian language. For example, in Serbian some of them are ADVERB NOUN, NOUN NOUN, ADVERB NOUN NOUN etc. Thanks in advance :)

buriy commented 4 years ago

Thanks. Other people also say this doesn't work for some reason. I'll check, and we'll fix it before the 2.1 release which should happen in approx 2 weeks. As for what are noun chunks in Russian: adj + nouns (sharing same case + gender + singular/plural trait ). This is similar to English parsing: "colorful painting". Another way of words combining is "noun + another noun (right noun should be in gent), this is equivalent of English "noun noun" composition when the left noun describes some property of the right noun ("red train", "men toilet", "Michael's idea"). Do you need to consider this as a single noun chunk? In English "stone hall" is a noun chunk, "hall (made) of stone" is not, I guess? Instead of noun, pronouns can be used ("my thought"). In some cases, if the left noun is also in gent, you need to know the semantics of the words to understand if they should be separated into different chunks. I'll check how this is tackled in English parsing.

buriy commented 4 years ago

Duplicate of #5, I'll update that one when it's ready.