cooper12121 / DIE-EC

4 stars 0 forks source link

DIE-EC

project for paper Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information (LREC-COLING 2024))



Prerequisites

Preprocessing

The main train process require the mentions pairs and embeddings from each set.

constrcut RST trees and Lexical chains

We first construct RST tress for each documents, When generate mention pair, we construct cross-document lexical chains.

WEC-Eng and WEC-Zh

Since WEC-Eng/WEC-Zh train set contains many mentions, generating all negative pairs is very resource and time consuming. To that end, we added a control for the negative:positive ratio.

#>python src/preprocess_gen_pairs.py

Generate Embeddings

To generate the embeddings for WEC-Eng/WEC-Zh run the following script and provide the slit files location, for example:

#>python src/preprocess_embed.py 

Initialize node

We use the generated embeddings to initialize node

>python src/preprocess_edu_embed.py

Training

See train.py file header for the complete set of script parameters. Model file will be saved at output folder (for each iteration that improves).

Cluster

#> python src/custer.py