christianbuck / nlu

7 stars 1 forks source link

Some sort of off-by-one error in generating coref mention strings #4

Closed nschneid closed 11 years ago

nschneid commented 11 years ago

the string in the coref chain in wsj_0001.1 is missing a word:

    [
      4,
      11,
      1,
      "chairman of Elsevier N.V. , Dutch publishing group"
    ]

should have "the" after the comma

nschneid commented 11 years ago

In the extraction pipeline, corefStanford.py was deleting all occurrences of 'the' from the string; likewise, corefOntonotes.py was deleting all occurrences of 'the', 'mr.', and 'mrs.'. Both were modified to only delete these if they occur as the first word of the string.