Closed myrainbowandsky closed 4 years ago
Hi @myrainbowandsky, sorry for the delayed response. This is really up to you, and your "stupid" method is pretty similar to what I might do in your position — there's nothing inherently wrong with filtering down to certain categories out of the full results.
That said, if you're looking for elegance, there are a couple fixes I'd recommend:
# extra parentheses around "token" when calling "parse(token)" are unnecessary
c_counts = Counter(category for token in Corpus['text'][1] for category in parse(token))
# use a set for faster performance — Python can check whether an item is in a set in constant
# time, vs. asking if `some_string in some_list`, in which case Python looks at each item in
# `some_list` and checks whether `some_list[i] == some_string`
name_list = {
'focuspast (Past Focus)',
'focusfuture (Future Focus)',
'cogproc (Cognitive Processes)',
}
for k, v in c_counts.items():
# a simple "if" will do the same thing as your "while ... break"
if k in name_list:
print(k, v)
You could also create a new count dictionary, rather than filtering while you're printing:
selected_c_counts = {k: v for k, v in c_counts.items() if k in name_list}
Finally — and this is a bit more involved — if you're really wanting to optimize performance, you could dig into the source code and filter out unwanted categories while parsing the lexicon, which would reduce the size of the trie used to look up matches for each token. But that's probably a lot more work than you need to do to solve your problem :)
I got :
What if I just want to show, say,
I have a stupid method using
May I have any more smarter approach?