clir / clearnlp

Software and resources for natural language processing.
http://www.clearnlp.com
Other
131 stars 31 forks source link

Comparison of ClearNLP 2.0 and ClearNLP 3.1 #7

Open mariana-romanyshyn opened 9 years ago

mariana-romanyshyn commented 9 years ago
  1. Missing "nsubj" dependency with correct POS tagging. I've noticed that the "nsubj" dependency disappeared in some cases where the subject is relatively far from the predicate. It worked fine in ClearNLP 2.0. For example, there is no nsubj(stare, People) dependency in these sentences:

People_NNS I_PRP 've_VBP known_VBN for_IN yearsNNS ,, who_WP I_PRP used_VBD to_TO greet_VB in_IN passing_VBG on_RP the_DT streetNN ,, now_RB stare_VB at_IN me_PRP with_IN a_DT mixture_NN of_IN fear_NN and_CC hatredNN ..

PeopleNNS ,, who_WP I_PRP used_VBD to_TO greet_VB in_IN passing_VBG on_RP the_DT streetNN ,, now_RB stare_VB at_IN me_PRP with_IN a_DT mixture_NN of_IN fear_NN and_CC hatredNN ..

But it comes back if I remove the clause:

People_NNS I_PRP 've_VBP known_VBN for_IN years_NNS now_RB stare_VBP at_IN me_PRP with_IN a_DT mixture_NN of_IN fear_NN and_CC hatredNN ..

  1. Missing "nsubj" dependency with incorrect POS tagging. The ClearNLP 2.0 dependency parser was able to provide correct dependency trees even if the POS tagging was incorrect. For example, in the cases below the parser used to recognize the predicate. With the new version the predicates went missing:

Trends_NNS that_WDT are_VBP prevalent_JJ todayNN ,, regardless_RB of_IN industryNN ,, matter_NN to_IN everyoneNN .. Used to have: nsubj(matter, Trends)

Place_NN a_DT large_JJ sheet_NN of_IN parchment_NN paper_NN on_IN a_DT work_NN surfaceNN ,, and_CC dust_NN the_DT parchment_NN lightly_RB with_IN flourNN .. Used to have: dobj(place, sheet)

As_IN Babe_NNP enters_VBZ the_DT Olympic_JJ stadiumNN ,, every_DT person_NN cheers_NNS and_CC yells_NNS her_PRP$ nameNN .. Used to have: nsubj(yells, person) Used to have: dobj(yells, name)

That_DT isVBZ ,, communism_NN functions_NNS as_IN the_DT negation_NN of_IN alienationNN ,, which_WDT in_IN turnNN ,, alienation_NN is_VBZ the_DT negation_NN of_IN manNN .. Used to have: nsubj(functions, communism)

The_DT object_NN wo_MD n't_RB move_VB till_IN an_DT external_JJ force_NN acts_NNS on_IN itPRP .. Used to have: nsubj(acts, force)

NB! Yet in some cases the dependencies remained unchanged:

Wendy_NNP thinks_NNS they_PRP 're_VBP frightenedVBN .. nsubj(thinks, Wendy)

Every_DT opportunity_NN and_CC option_NN changes_NNS usPRP .. nsubj(changes, opportunity) nsubj(changes, option) dobj(changes, us)

Must_MD be_VB very_RB busy_JJ because_IN he_PRP does_VBZ not_RB answers_NNS my_PRP$ e-mail_NN or_CC phone_NN callNN ,, but_CC he_PRP talk_VBP two_CD weeks_NNS agoRB ,, he_PRP needed_VBD some_DT more_JJR timeNN .. nsubj(answers, he) dobj(answers, e-mail) dobj(answers, call)

If_IN you_PRP suddenly_RB notice_VBP that_IN someone_NN is_VBZ suspiciously_RB interested_JJ for_IN the_DT diamond_NN jewelery_NN displayNN ,, alert_NN the_DT securityNN .. dobj(alert, security)

One_CD example_NN of_IN this_DT natural_JJ laws_NNS is_VBZ that_IN everything_NN changes_NNS except_IN of_IN changeNN ,, itselfPRP .. nsubj(changes, everything)

Here is an interesting example. In the first sentence, the predicate and its dependencies are present, yet they are gone in the second case. ClearNLP 2.0 returned the correct dependencies in both cases:

Mary_NNP thanks_NNS you_PRP for_IN the_DT ocasional_JJ brotherly_JJ concerns_NNS in_IN the_DT form_NN of_IN wake-up-callsNNS .. Present: nsubj(thanks, she) dobj(thanks, you)

She_PRP thanks_NNS you_PRP for_IN the_DT ocasional_JJ brotherly_JJ concerns_NNS in_IN the_DT form_NN of_IN wake-up-callsNNS .. Gone: nsubj(thanks, she) dobj(thanks, you)

A different case of ambiguity that the previous version of ClearNLP was able to deal with:

By_IN 1778CD ,, the_DT Prince_NNP is_VBZ paying_VBG for_IN full_JJ theater_NN seasonsNNS ;: opera_NN directing_VBG and_CC composing_VBG become_VBN Hayd_NNP n_NN 's_POS full-time_JJ jobNN .. Used to have: nsubj(become, composing)

  1. Missing "nsubj" with correct POS tagging and an error in the sentence. The interesting thing here is that now I get a "poss" dependency for a modal verb (proved by the "aux" dependency), which is grammatically impossible.

I_PRP know_VBP your_PRP$ might_MD be_VB hesitant_JJ to_TO teachVB ,, but_CC if_IN you_PRP teach_VBP God_NNP 's_POS word_NN it_PRP will_MD be_VB beneficial_JJ to_IN these_DT kidsNNS .. Used to have: nsubj(be, your) Now: poss(might, your) but still aux(be, might)

I_PRP think_VBP your_PRP$ must_MD be_VB very_RB good_JJ in_IN English_NNP languageNN .. Used to have: nsubj(be, your) Now: poss(must, your) but still aux(be, must)

I_PRP wish_VBP your_PRP$ will_MD find_VB a_DT better_JJR oneNN .. Used to have: nsubj(find, your) Now: poss(will, your) but still aux(find, will)

Some cases remained unchanged, though:

I_PRP hope_VBP your_PRP$ are_VBP okayJJ !. nsubj(are, your)

I_PRP think_VBP your_PRP$ were_VBD right_JJ about_IN the_DT public_JJ mediaNNS .. nsubj(were, your)

  1. The "appos" dependency went missing. It was present with ClearNLP 2.0:

AnnaNNP ,, pregnant_JJ 29-year_JJ old_JJ Connecticut_NNP social_JJ workerNN ,, is_VBZ at_IN homeNN .. Used to have: appos(Anna, worker) Now: no relation between "Anna" and "worker". nsubj(is, Anna) nsubj(is, worker)

PeterNNP ,, pigeon-toed_JJ penguinNN ,, was_VBD a_DT nice_JJ guyNN ,, who_WP Bucky_NNP knew_VBD he_PRP could_MD always_RB count_VB onIN .. Used to have: appos(Peter, penguin) Now: no relation between "Peter" and "penguin". nsubj(was, Peter) nsubj(was, penguin)

There_EX are_VBP many_JJ free_JJ out_IN thereRB ,, pick_VB one_NN you_PRP like_VBP or_CC if_IN you_PRP have_VBP no_DT idea_NN my_PRP$ tip_NN would_MD be_VB to_TO download_VB PicasaNNP ,, free_JJ photo_NN digital_NN softwareNN .. Used to have: appos(Picasa, software) Now: no relation between "Picasa" and "software".

An interesting case: the "appos" dependency comes back if I remove the quotation marks:

``_" The_DT hangmanNN ,, grey-haired_JJ convict_NN in_IN the_DT white_JJ uniform_NN of_IN the_DT prisonNN ,, was_VBD waiting_VBG beside_IN his_PRP$ machineNN .. ''_" Used to have: appos(hangman, convict) Now: nsubj(waiting, convict) nsubj(waiting, hangman)

The_DT hangmanNN ,, grey-haired_JJ convict_NN in_IN the_DT white_JJ uniform_NN of_IN the_DT prisonNN ,, was_VBD waiting_VBG beside_IN his_PRP$ machineNN .. appos(hangman, convict) nsubj(waiting, hangman)

  1. Just one more error I came across:

Let_VB me_PRP know_VB if_IN those_DT dates_NNS work_VB for_IN you_PRP and_CC we_PRP 'll_MD get_VB you_PRP ticketNN .. poss(ticket, you)