This is a named entity recognizer based on pytorch-pretrained-bert.
python 3.5+
pytorch 0.4.1
pytorch-pretrained-bert 0.6.1
tqdm
PyYAML
njuner
preprocess_msra.py
preprocess_pd98.py
run_ner.py
A NER tool which recognizes predefined entities like PERSONs, LOCATIONs and ORGANIZATIONs in texts. It is completely end-2-end and does not require word segmentation or part-of-speech information.
pip install njuner
As a package
from njuner import NJUNER
ner = NJUNER(model_dir=model_path)
ner.label(['李雷和韩梅梅去上海迪斯尼乐园。'])
# [[('B-PER', '李'), ('I-PER', '雷'), ('O', '和'), ('B-PER', '韩'), ('I-PER', '梅'), ('I-PER', '梅'), ('O', '去'), ('B-ORG', '上'), ('I-ORG', '海'), ('I-ORG', '迪'), ('I-ORG', '斯'), ('I-ORG', '尼'), ('I-ORG', '乐'), ('I-ORG', '园'), ('O', '。') ]]
As a command line tool
njuner -h
njuner --model_dir model_path --input_file input.txt --output_dir ./
This will produce there files, which are "tokens.txt", "predictions.txt" and "summary.txt", in the output directory.
Pretrained model
You can get several pretrained models from the NJUNER releases page. Uncompress the model archive and pass the directory to the parameter "model_dir".
Training and testing on the corresponding dataset.
Item | MSRA | Weibo-NE | Resume | CoNLL-2003 |
---|---|---|---|---|
Baseline | 93.18 | 55.28 | 94.46 | 92.4 |
NJUNER | 95.02 | 66.95 | 95.62 | 91.7 |
Comparison of different Chinese NER tools.
Item | MSRA | Weibo-NE |
---|---|---|
HanLP | 72.65 | 38.66 |
LTP | 73.34 | 43.97 |
NJUNER | 81.58 | 63.08 |