Closed nschneid closed 11 years ago
In the extraction pipeline, corefStanford.py
was deleting all occurrences of 'the' from the string; likewise, corefOntonotes.py
was deleting all occurrences of 'the', 'mr.', and 'mrs.'. Both were modified to only delete these if they occur as the first word of the string.
the string in the coref chain in wsj_0001.1 is missing a word:
should have "the" after the comma