TencentARC / MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
136 stars 16 forks source link

How to extract noun phrase and verb? #3

Closed LeeYN-43 closed 2 years ago

LeeYN-43 commented 2 years ago

Thanks to your great work! I want to extract noun phrases and verbs on my own dataset, could you please tell me what tool you used to extract it?

geyuying commented 2 years ago

To extract noun phrases, you can refer to https://www.pythonprogramming.in/how-to-extract-noun-phrases-using-textblob.html. We extract the noun phrases using the below script.

from textblob import TextBlob
blob = TextBlob("An old woman is dancing on the green grass")
blob.noun_phrases

When "TextBlob" fails to extract noun phrases of a text, you can refer to https://stackoverflow.com/questions/33587667/extracting-all-nouns-from-a-text-file-using-nltk and we use the example as below,

import nltk

lines = 'lines is some string of words'
# function to test if something is a noun
is_noun = lambda pos: pos[:2] == 'NN'
# do the nlp stuff
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)] 

print nouns

To extract verb phrases, you can refer to https://microeducate.tech/extract-verb-phrases-using-spacy/ and we use the "Edit 2" as below,

import spacy   
from spacy.matcher import Matcher
from spacy.util import filter_spans

nlp = spacy.load('en_core_web_sm') 

sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
           {'POS': 'ADV', 'OP': '*'},
           {'POS': 'AUX', 'OP': '*'},
           {'POS': 'VERB', 'OP': '+'}]

# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)

doc = nlp(sentence) 
# call the matcher to find matches 
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]

print (filter_spans(spans))   
vateye commented 2 years ago

Did you extract the nouns directly or extract the noun phrases for training? Thanks. @geyuying