project for paper Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information (LREC-COLING 2024))
#>pip install -r requirements.txt
#>export PYTHONPATH=<ROOT_PROJECT_FOLDER>
The main train process require the mentions pairs and embeddings from each set.
We first construct RST tress for each documents, When generate mention pair, we construct cross-document lexical chains.
Since WEC-Eng/WEC-Zh train set contains many mentions, generating all negative pairs is very resource and time consuming.
To that end, we added a control for the negative:positive ratio.
#>python src/preprocess_gen_pairs.py
To generate the embeddings for WEC-Eng/WEC-Zh run the following script and provide the slit files location, for example:
#>python src/preprocess_embed.py
We use the generated embeddings to initialize node
See train.py
file header for the complete set of script parameters.
Model file will be saved at output folder (for each iteration that improves).
#> python src/train.py
#> python src/inference.py
#> python src/custer.py