chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

In POS not getting Nouns (pattern=r’<NOUN>+’) #236

Closed SreejaRoyal closed 5 years ago

SreejaRoyal commented 5 years ago

Hi, this is my code..... in response I want only one nouns.but am getting adjective also.

import textacy textacy.version '0.6.2' text ="Two weeks ago, I was in kuwait participating in an I.M.F. seminar for Arab educators." doc=textacy.Doc(text) pattern = "+" list1=list(textacy.extract.pos_regexmatches(doc,pattern)) text1=" ".join(str(x) for x in list1) doc1=textacy.Doc(text1) for token in doc1: ... dictObj = {} ... dictObj["text"] = str(token) ... dictObj["pos"] = str(token.pos) ... print(dictObj) ... {'text': 'weeks', 'pos': 'NOUN'} {'text': 'kuwait', 'pos': 'ADJ'} {'text': 'seminar', 'pos': 'NOUN'} {'text': 'educators', 'pos': 'NOUN'}

How to resolve this issue.

operating system: python version:3.5 spacy version:2.0 installed spacy models:[en] textacy version:0.6.2

bdewilde commented 5 years ago

Hi @SreejaRoyal , it looks like your pattern is just "+" and not "<NOUN>+". Does this still happen with the right regex pattern?

bdewilde commented 5 years ago

Hey again, I added a test for this issue, and behavior is as expected. I'm assuming the issue was in your pattern. So, I'm closing this issue, but please reopen if I've misdiagnosed the problem.