Constannnnnt / Distributed-CoreNLP

This infrastructure, built on Stanford CoreNLP, MapReduce and Spark with Java, aims at processing documents annotations at large scale.
https://github.com/Constannnnnt/Distributed-CoreNLP
MIT License
0 stars 0 forks source link

finish regexner #24

Closed KaisongHuang closed 5 years ago

KaisongHuang commented 5 years ago

Put additional NER rules into regexner.txt whose path is hard-coded. The simplest rule file has two tab-separated fields on a line.

Bachelor of Arts    DEGREE
Bachelor of Laws    DEGREE

Let's say we have an input file with the following sentence.

She graduated from the University of Melbourne with a Bachelor of Arts and a Bachelor of Laws in 1986.

Use annotator "ner" we get the following output.

(She,O) (graduated,O) (from,O) (the,O) (University,ORGANIZATION) (of,ORGANIZATION) (Melbourne,ORGANIZATION) (with,O) (a,O) (Bachelor,TITLE) (of,O) (Arts,O) (and,O) (a,O) (Bachelor,TITLE) (of,O) (Laws,O) (in,O) (1986,DATE) (.,O)

Use annotator "regexner" we get the following output.

(She,O) (graduated,O) (from,O) (the,O) (University,ORGANIZATION) (of,ORGANIZATION) (Melbourne,ORGANIZATION) (with,O) (a,O) (Bachelor,DEGREE) (of,DEGREE) (Arts,DEGREE) (and,O) (a,O) (Bachelor,DEGREE) (of,DEGREE) (Laws,DEGREE) (in,O) (1986,DATE) (.,O)