BERT-based Biomedical Text Summarizer
Download version 1 or version 2 of the BERT-based biomedical text summarizer.
Extract the zip file.
Download the BERT repository from https://github.com/google-research/bert, and copy the files to the BERT directory already available with the summarizer.
Download a BERT pretrained model from https://github.com/google-research/bert or a BioBERT pretrained model from https://github.com/naver/biobert-pretrained, and copy the files to the BERT directory already available with the summarizer.
Copy your input document (preferably a txt file) to the INPUT directory already available with the summarizer.
Run the following script:
Four parameters must be specified when running the script:
After finishing the summarization process, the summary can be found in the OUTPUT directory already available with the summarizer.
Example
The following script uses the file Input.txt as the input, runs the summarizer with a compression rate of 30 percent and a final cluster number of 4, and finally stores the summary in the file Output.txt:
python Summarizer.py -i Input.txt -o Output.txt -c 0.3 -k 4
Final evaluation results
ROUGE-1 | ROUGE-2 | |
BERT-based summarizer (BERT-large) | 0.7504 | 0.3312 |
BERT-based summarizer (BioBERT-pubmed+pmc) | 0.7411 | 0.3228 |
BERT-based summarizer (BioBERT-pubmed) | 0.7376 | 0.3203 |
CIBS biomedical summarizer | 0.7345 | 0.3187 |
BERT-based summarizer (BioBERT-pmc) | 0.7309 | 0.3164 |
Bayesian biomedical summarizer | 0.7288 | 0.3143 |
BERT-based summarizer (BERT-base) | 0.7257 | 0.3110 |
SUMMA | 0.7098 | 0.3022 |
TexLexAn | 0.6982 | 0.2979 |
Lead baseline | 0.6116 | 0.2311 |
Random baseline | 0.5667 | 0.1999 |
Parameterization results (Euclidean distance)
BERT-base | BERT-large | BioBERT-pmc | BioBERT-pubmed | BioBERT-pubmed+pmc | ||||||
K | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 |
2 | 0.7221 | 0.3087 | 0.7434 | 0.3264 | 0.7243 | 0.3094 | 0.7269 | 0.3122 | 0.7369 | 0.3195 |
3 | 0.7291 | 0.3133 | 0.7457 | 0.3285 | 0.7308 | 0.3172 | 0.7361 | 0.3186 | 0.7429 | 0.3265 |
4 | 0.7224 | 0.3107 | 0.7507 | 0.3329 | 0.7299 | 0.3189 | 0.7354 | 0.3187 | 0.7399 | 0.3234 |
5 | 0.7205 | 0.3114 | 0.7467 | 0.3302 | 0.7272 | 0.3138 | 0.7293 | 0.3183 | 0.7398 | 0.3229 |
6 | 0.7199 | 0.3099 | 0.7415 | 0.3249 | 0.7239 | 0.3134 | 0.7276 | 0.3146 | 0.7352 | 0.3199 |
7 | 0.7157 | 0.3075 | 0.7366 | 0.3208 | 0.7187 | 0.3097 | 0.7226 | 0.3111 | 0.7313 | 0.3170 |
8 | 0.7179 | 0.3079 | 0.7334 | 0.3183 | 0.7194 | 0.3089 | 0.7198 | 0.3074 | 0.7272 | 0.3122 |
9 | 0.7146 | 0.3084 | 0.7291 | 0.3173 | 0.7183 | 0.3099 | 0.7174 | 0.3062 | 0.7273 | 0.3087 |
10 | 0.7127 | 0.3054 | 0.7284 | 0.3137 | 0.7186 | 0.3102 | 0.7162 | 0.3036 | 0.7196 | 0.3080 |
11 | 0.7063 | 0.2990 | 0.7257 | 0.3089 | 0.7148 | 0.3161 | 0.7113 | 0.2992 | 0.7164 | 0.3027 |
12 | 0.7034 | 0.2968 | 0.7203 | 0.3101 | 0.7094 | 0.3088 | 0.7087 | 0.2995 | 0.7117 | 0.3006 |
Parameterization results (Cosine similarity)
BERT-base | BERT-large | BioBERT-pmc | BioBERT-pubmed | BioBERT-pubmed+pmc | ||||||
K | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 | R-1 | R-2 |
2 | 0.7196 | 0.3092 | 0.7328 | 0.3224 | 0.7242 | 0.3117 | 0.7177 | 0.3095 | 0.7285 | 0.3163 |
3 | 0.7169 | 0.3102 | 0.7377 | 0.3275 | 0.7249 | 0.3131 | 0.7224 | 0.3089 | 0.7328 | 0.3204 |
4 | 0.7212 | 0.3107 | 0.7362 | 0.3249 | 0.7272 | 0.3107 | 0.7268 | 0.3184 | 0.7278 | 0.3202 |
5 | 0.7152 | 0.3068 | 0.7361 | 0.3259 | 0.7212 | 0.3082 | 0.7298 | 0.3165 | 0.7295 | 0.3199 |
6 | 0.7136 | 0.3026 | 0.7299 | 0.3205 | 0.7171 | 0.3071 | 0.7261 | 0.3157 | 0.7272 | 0.3160 |
7 | 0.7107 | 0.2984 | 0.7259 | 0.3162 | 0.7173 | 0.3008 | 0.7221 | 0.3126 | 0.7224 | 0.3136 |
8 | 0.7071 | 0.2988 | 0.7231 | 0.3127 | 0.7176 | 0.3049 | 0.7207 | 0.3102 | 0.7199 | 0.3135 |
9 | 0.7037 | 0.2968 | 0.7194 | 0.3094 | 0.7119 | 0.3001 | 0.7170 | 0.3072 | 0.7182 | 0.3099 |
10 | 0.6989 | 0.2917 | 0.7173 | 0.3068 | 0.7073 | 0.2965 | 0.7143 | 0.3056 | 0.7158 | 0.3074 |
11 | 0.6953 | 0.2905 | 0.7146 | 0.3046 | 0.7035 | 0.2954 | 0.7080 | 0.2986 | 0.7126 | 0.3069 |
12 | 0.6908 | 0.2879 | 0.7142 | 0.3018 | 0.6995 | 0.2882 | 0.7033 | 0.2967 | 0.7106 | 0.3034 |