Zce1112zslx / KID

Source code for "A Double-Graph Based Framework for Frame Semantic Parsing" @ NAACL 2022
MIT License
8 stars 3 forks source link

KID

Source code for "A Double-Graph Based Framework for Frame Semantic Parsing" @ NAACL 2022

Overview

TL;DR: A framework for frame semantic parsing aims to utilize double-graph structure to inject knowledge into parsing and strengthen interacrions between arguments.

Paper Link: A Double-Graph Based Framework for Frame Semantic Parsing, TODO: this is early version of our paper, we will replace it with camera-ready version soon.

We find ontological frame knowledge can contribute to frame semantic parsing by both intra-frame and inter-frame reasoning. To make use of it, we build FKG (Frame Knowledge Graph) on definitions of FEs, frame relations and FE mappings. Besides, We regard frame semantic parsing as a process to add nodes in FSG (Frame Semantic Graph) incrementally, which can strengthen relations beween arguments and interactions of subtasks in frame semantic parsing.

Requirements

pandas == 1.4.1
pytorch == 1.11.0
scipy == 1.8.0
stanza == 1.3.0

How to Run Our Code?

Datapreprocess

You can get some data files from scratch, we will also provide links to download remaining data files.

Data Folder Structure

Please make sure your data folder structure as below.

.
├── dev_instance_dic.npy
├── exemplar_instance_dic.npy
├── fe_label_to_dict.npy
├── fn1.5
│   └── conll
│       ├── dev
│       ├── exemplar
│       ├── frames
│       ├── test
│       └── train
├── fndata-1.5
│   └── ...
├── frame-fe-dist-path
│   ├── fe_dis_matrix.npy
│   ├── fe_hash_idx.npy
│   ├── fe_path_matrix.npy
│   ├── frame_dis_matrix.npy
│   ├── frame_hash_idx.npy
│   └── frame_path_matrix.npy
├── glove.6B.200d.txt
├── graph
│   ├── frame_fe.npz
│   ├── frame_frame.npz
│   ├── inter_fe.npz
│   ├── intra_fe.npz
│   └── self_loop.npz
├── intra_frame_fe_relations.npy
├── parsed-v1.5
│   ├── FE.csv
│   ├── feRelations.csv
│   ├── frame.csv
│   ├── frameRelations.csv
│   ├── fulltext
│   ├── LU.csv
│   └── ...
├── test_instance_dic.npy
└── train_instance_dic.npy

TODO: replace them with one single shell data_preprocess.sh

Run data_preprocess.py to get data_instance_dic and word/lemma vectors from GloVe.

python data_preprocess.py

Then run dep_parsing.py to build dependency trees for each sentence. This will modify data_instance_dic produced by data_preprocess.py, and we DO NOT parse exemplar_instance_dic because it will take a lot time. You can modify dep_parsing.py to parse exemplar_insrtance_dic if you want to pretrain on exemplar sentences.

python dep_parsing.py

Run build_fe_name_dict.py to get fe_label_to_name.npy which stores a dict to map FE id to their name.

python build_fe_name_dict.py

Run parse_fe_xml.py and rel_graph_construct.py to build adj matrix for FKG.

pretraining on exemplar sentences

python pretrain.py --save_model_path [prefix of your model path, e.g. ../model/pretrain_] --epoch 30

Fine-tuning on train instances:

python train.py --pretrain_model_path ../model/pretrain_30.bin --lr 6e-5 --save_model_path [path where you save fine-tuned model]

Train

python train.py --save_model_path [path where you save trained model]

For more details of arguments, see config.py

Evaluate

python train.py --mode test --batch_size 2 --save_model_path [path where you save trained model]