chentao1999 / MedicalRelationExtraction

The depository support training and testing BERT-CNN model on three medical relation extraction corpora: BioCreative V CDR task corpus, traditional Chinese medicine literature corpus, and i2b2 temporal relation corpus.
MIT License
72 stars 20 forks source link

This is an implementation of BERT-CNN model used in our paper "A General Approach for Improving Deep Learning-based Medical Relation Extraction using a Pre-trained Model and Fine-tuning"

The depository support training and testing BERT-CNN model on three medical relation extraction corpora: BioCreative V CDR task corpus (in short, BC5CDR corpus), traditional Chinese medicine (TCM) literature corpus (in short, TCM corpus), and the 2012 informatics for integrating biology and the bedside (i2b2) project temporal relations challenge corpus (in short, i2b2 temporal corpus). These scripts are based on google-research/bert and cjymz886/text_bert_cnn. Thanks!

Requirement

Datasets

Usage

1) Download the data above.

For BC5CDR corpus, unzip CDR_Data.zip in ./corpus/BC5CDR. The files in this folder are like this:

For TCM corpus, unzip TCMRelationExtraction.zip in ./corpus/TCM. The files in this folder are like this:

For i2b2 temporal corpus, unzip 2012-07-15.original-annotation.release.tar.gz and 2012-08-23.test-data.groundtruth.tar.gz in ./corpus/i2b2.

The files in this folder are like this:

2) Download pre-trained BERT model.

Download uncased_L-12_H-768_A-12 and chinese_L-12_H-768_A-12 BERT models from https://github.com/google-research/bert.

Unzip uncased_L-12_H-768_A-12.zip and chinese_L-12_H-768_A-12.zip in folder ./pretrained_bert_model.

3) Run

For BC5CDR corpus, run 
    python data_process_BC5CDR.py
    python run_BC5CDR.py train
    python run_BC5CDR.py test

For TCM corpus,
    python data_process_TCM.py
    python run_TCM.py train
    python run_TCM.py test

For i2b2 temporal corpus, 
    python data_process_i2b2.py
    python run_i2b2.py train
    python run_i2b2.py test

The best model will save in folder ./result/checkpoints/.