This is an accompanying repository for our *SEM 2018 paper (.pdf). It contains the code to replicate the experiments and train the models described in the paper.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Please use the following citation:
@inproceedings{TUD-CS-2018-01,
title = {Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories},
author = {Sorokin, Daniil and Gurevych, Iryna},
publisher = {Association for Computational Linguistics},
booktitle = {Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018) },
pages = {to appear},
month = jun,
year = {2018},
location = {New Orleans, LA, U.S.}
}
The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity.
We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.
Please, refer to the paper for more the model description and training details
If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.
File | Description |
---|---|
configs/ | Configuration files for the experiments |
entitylinking/core | Mention extraction and candidate retrieval |
entitylinking/datasets | Datasets IO |
entitylinking/evaluation | Evaluation measures and scripts |
entitylinking/mlearning | Model definition and training scripts |
entitylinking/wikidata | Retrieving information from Wikidata |
resources/ | Necessary resources |
trainedmodels/ | Trained models |
requirements.txt
for the full list of packagesconda create -n qa-env python=3.6
and activate it conda activate qa-env
conda install pytorch=0.3.1 -c pytorch
(with CUDA if you want to use GPU)requirements.txt
with: conda install --yes --file requirements.txt
. pycorenlp, SPARQLWrapper
with pip install pycorenlp SPARQLWrapper
.Follow the steps to use this project as an external entity-linking tool. FeatureModel_Baseline
is a part of the repository, you can download the VCG
model here.
For the VCG model you also need KB embeddings produced by Fast-TransX. Download here.
trainedmodels/
folder in the main directory of the projectresources/glove/
in the main directory of the projecttrainedmodels/FeatureModel_Baseline.param
from entitylinking import core
entitylinker = core.MLLinker(path_to_model="trainedmodels/FeatureModel_Baseline.torchweights")
output = entitylinker.link_entities_in_raw_input("Barack Obama is a president.")
print(output.entities)
run_experiments.sh