Added example - Githubissues

CogComp / cogcomp-nlpy

CogComp's light-weight Python NLP annotators

http://nlp.cogcomp.org/

Other

116 stars 26 forks source link

Added example #48

Closed GHLgh closed 7 years ago

GHLgh commented 7 years ago

@danyaljj Example for the first bullet point in #44

We can close this pr after the example is put in ipython notebook

danyaljj commented 7 years ago

How about we change it slightly in this way? In the example, retrieve a random document. (Say this: https://github.com/ryanmcdermott/trump-speeches/blob/master/speeches.txt ). Then count all the verbs (POS = VB, VBB, VBD, VBG, VBN, VBZ, VBP) that occur "immediately after" a person (NER = PER). (By "immediately after" Like same sentence, I mean after, same sentence, within window of 3 words.)

What do you think?

GHLgh commented 7 years ago

It's doable, I can try that.

When you said 3 words, do you mind 3 tokens? I ask about it because punctuations are also counted as tokens, right?

danyaljj commented 7 years ago

Yeah tokens should be fine.

danyaljj commented 7 years ago

BTW, we shouldn't send everything altogether to the pipeline. We can split based on new lines and tabs, before sending it to the pipeline.

bhargav commented 7 years ago

Also as a general comment, the usage is not easy. We should make it easier to access neighboring tokens somehow. https://github.com/CogComp/sioux/pull/48/files#diff-dc8b50acc65729bc37a3b573f4ab541eR31

Also being able to iterate over a view would be useful IMO.

for ner_token in pipeline.get_ner(doc):
    print(ner_token['label'])

GHLgh commented 7 years ago

Good idea, I can make the class a iterator, then we can get rid of some_view_class.get_cons()

@bhargav how would you want to be easier to access neighboring tokens? If we can iterate the view and find constituent by index, would that be sufficient?

I can make the usage simpler by adding corresponding tokens in the constituent (then we have ner_con['tokens'] == 'tokens of this constituent'). Right now we have to do some_view.get_cons(key='token')[constituent_index]