Code and models for neural modeling of Hebrew NER. Described in the TACL paper "Neural Modeling for Named Entities and Morphology (NEMO2)" along with extensive experiments on the different modeling scenarios provided in this repository.
git clone https://github.com/OnlpLab/NEMO.git
cd NEMO
pip install -r requirements.txt
gunzip data/*.gz
yap
: https://github.com/OnlpLab/yap./yap api
uvicorn api_main:app --port 8090
nemo.py
YAP_PATH
in config.py
to the path of your local yap
executable.docker-compose up
(pulls, builds and/or startup will take a few minutes, depending on your bandwidth)8090
.
docker-compose.yml
.nemo.py
with a specific command (scenario), on a text file of Hebrew sentences separated by a line-break.run_ner_model
command with the token-single
model will tokenize sentences and run the token-single
model:
python nemo.py run_ner_model token-single example.txt example_output.txt
morph_hybrid
command runs the end-to-end segmentation and NER pipeline which provided our best performing morpheme-level NER boundaries:
python nemo.py morph_yap morph example.txt example_output_MORPH.txt
nemo.py
. Models are all standard Bi-LSTM-CRF with char encoding (LSTM/CNN) of NCRFpp with pre-trained fastText embeddings. Differences between models lay in:
morph
vs. tokens token-*
token-single
single sequence labels (e.g. B-ORG
) vs. token-multi
multi-labels (atomic labels, e.g. O-ORG^B-ORG^I-ORG
) that predict, in order, the labels for the morphemes the token is made of.Token-based Models | Morpheme-based Model |
---|---|
Morphemes must be predicted. This is done by performing morphological disambiguation (MD). We offer two options to do so:
morph_yap
command, which runs our morph
NER model on the output of YAP joint segmentation.token-multi
model to reduce the MD option space. This is used in morph_hybrid
, multi_align_hybrid
and morph_hybrid_align_tokens
. We will explain these scenarios next.MD Approach | Commands |
---|---|
Standard | morph_yap |
Hybrid |
morph_hybrid ,multi_align_hybrid ,morph_hybrid_align_tokens |
Finally, to get our desired output (tokens/morphemes), we can choose between different scenarios, some involving extra post-processing alignments:
morph
NER model on predicted morphemes: Commands: morph_yap
or morph_hybrid
(better). token-multi
labels can be aligned with predicted morphemes to get morpheme-level boundaries. Command: multi_align_hybrid
.Run morph NER on Predicted Morphemes |
Multi Predictions Aligned with Predicted Morpheme |
---|---|
morph_yap ,morph_hybrid |
multi_align_hybrid |
run_ner_model
command with token-single
model.token-multi
can be mapped to token-single
labels to get standard token-single output. The command multi_to_single
does this end-to-end.morph_hybrid_align_tokens
(this achieved best token-level results in our experiments). Run token-single |
Map token-multi to token-single |
Align morph NER with Tokens |
---|---|---|
run_ner_model token-single |
multi_to_single |
morph_hybrid_align_tokens |
morph_hybrid*
scenarios offer the best performance, they are slightly less efficient since they requires running both morph
and token-multi
NER models (yap calls take up most of the runtime anyway, so this is not extremely significant).*oov*
models from here and extract to the data/
folder (they already appear in config.py
). We provide template NCRF++ config files. These files already contain the hyperparameters we used in our training. To train your own model:
word_emb_dir
to that of an embedding vectors file in standard word2vec textual format. You can use the fastText bin models we make available (in the next section) or any other embedding vectors of your choice.python ncrf_main.py --config <path_to_config> --device <gpu_device_number>
The word embeddings we trained and used in our models are available:
These were trained on a 2013 Wiki dump corpus by Yoav Goldberg, which we re-tokenized and then re-parsed using YAP:
To evaluate your predictions against gold use the ne_evaluate_mentions.py script. Evaluation looks for exact match of string and entity category, but is slightly different than the standard CoNLL2003 evaluation commonly used for NER. The reason is that predicted segmentation differs from gold, so positional indexes of sequence labels cannot be used. What we do instead, is extract multi-sets of entity mentions and use set operations to compute precision, recall and F1-score. You can find more detailed discussion of evaluation in the NEMO2 paper.
To evaluate an output prediction file against a gold file use:
python ne_evaluate_mentions.py <path_to_gold_ner> <path_to_predicted_ner>
If you're within python, just call ne_evaluate_mentions.evaluate_files(...)
with the same parameters.
In our NEMO2 paper we also evaluate our models on the Ben-Mordecai Hebrew NER Corpus (BMC). The 3 random splits we used can be found here.
If you use any of the NEMO2 code, models, embeddings or the NEMO corpus, please cite the NEMO2 paper:
@article{10.1162/tacl_a_00404,
author = {Bareket, Dan and Tsarfaty, Reut},
title = "{Neural Modeling for Named Entities and Morphology (NEMO2)}",
journal = {Transactions of the Association for Computational Linguistics},
volume = {9},
pages = {909-928},
year = {2021},
month = {09},
abstract = "{Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically rich languages (MRLs) pose a challenge to this basic formulation, as the boundaries of named entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental questions, namely, what are the basic units to be labeled, and how can these units be detected and classified in realistic settings (i.e., where no gold morphology is available). We empirically investigate these questions on a novel NER benchmark, with parallel token- level and morpheme-level NER annotations, which we develop for Modern Hebrew, a morphologically rich-and-ambiguous language. Our results show that explicitly modeling morphological boundaries leads to improved NER performance, and that a novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline, where morphological decomposition strictly precedes NER, setting a new performance bar for both Hebrew NER and Hebrew morphological decomposition tasks.}",
issn = {2307-387X},
doi = {10.1162/tacl_a_00404},
url = {https://doi.org/10.1162/tacl\_a\_00404},
eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00404/1962472/tacl\_a\_00404.pdf},
}
If you use the NEMO2's NER models please also cite NCRF++:
@inproceedings{yang2018ncrf,
title={{NCRF}++: An Open-source Neural Sequence Labeling Toolkit},
author={Yang, Jie and Zhang, Yue},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
Url = {http://aclweb.org/anthology/P18-4013},
year={2018}
}