Closed LeeYN-43 closed 2 years ago
To extract noun phrases, you can refer to https://www.pythonprogramming.in/how-to-extract-noun-phrases-using-textblob.html. We extract the noun phrases using the below script.
from textblob import TextBlob
blob = TextBlob("An old woman is dancing on the green grass")
blob.noun_phrases
When "TextBlob" fails to extract noun phrases of a text, you can refer to https://stackoverflow.com/questions/33587667/extracting-all-nouns-from-a-text-file-using-nltk and we use the example as below,
import nltk
lines = 'lines is some string of words'
# function to test if something is a noun
is_noun = lambda pos: pos[:2] == 'NN'
# do the nlp stuff
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]
print nouns
To extract verb phrases, you can refer to https://microeducate.tech/extract-verb-phrases-using-spacy/ and we use the "Edit 2" as below,
import spacy
from spacy.matcher import Matcher
from spacy.util import filter_spans
nlp = spacy.load('en_core_web_sm')
sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'AUX', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]
# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)
doc = nlp(sentence)
# call the matcher to find matches
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]
print (filter_spans(spans))
Did you extract the nouns directly or extract the noun phrases for training? Thanks. @geyuying
Thanks to your great work! I want to extract noun phrases and verbs on my own dataset, could you please tell me what tool you used to extract it?