RUCAIBox / MVP

This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.
Apache License 2.0
68 stars 3 forks source link
data-to-text dialog multi-task-learning natural-language-generation natural-language-processing nlg nlp plm pre-trained-model question-answering question-generation seq2seq sequence-to-sequence story-generation summarization text-generation

MVP: Multi-task Supervised Pre-training for Natural Language Generation

This repository is the official implementation of our paper https://arxiv.org/abs/2206.12131. The implementation is completely based on our text generation library TextBox 2.0.

Overview

model

Tips:

Installation

You should clone the TextBox repository and follow its instructions.

git clone https://github.com/RUCAIBox/TextBox.git && cd TextBox
bash install.sh

Datasets

You can download our datasets for fine-tuning in: https://huggingface.co/RUCAIBox. You should create a folder dataset and download dataset such as cnndm in it.

Now we support 11 generation tasks and corresponding datasets:

Fine-tuning, Inference and Evaluation

After downloading the dataset, our code can conduct fine-tuning, inference and evaluation in a pipeline.

We propose MVP, MVP+S/M, Single, and BART in our paper, details can be found here.

Fine-tuning with MVP:

python run_textbox.py --model=MVP --dataset=[dataset_name] --model_path=RUCAIBox/mvp

dataset_name can be one of the name under dataset folder, such as cnndm and webnlg.

Fine-tuning with MVP+S/M:

python run_textbox.py --model=MVP --dataset=[dataset_name] --model_path=RUCAIBox/mvp-[task_name]

task_name can be selected from summarization, open-dialog, data-to-text, question-generation, story, question-answering and task-dialog. If you want to fine-tune MVP+M, the task_name should be multi-task.

For example, to fine-tune squadqg dataset on question generation using MVP+S:

python run_textbox.py --model=MVP --dataset=squadqg --model_path=RUCAIBox/mvp-question-generation

Fine-tuning with Single and BART:

python run_textbox.py --model=MVP --dataset=[dataset_name] --model_path=RUCAIBox/mtl-[task_name]

task_name can be selected from summarization, open-dialog, data-to-text, question-generation, story, question-answering and task-dialog.

We also support to fine-tune with BART:

python run_textbox.py --model=BART --dataset=[dataset_name] --model_path=facebook/bart-large

Lightweight Tuning:

If you want to conduct lightweight tuning of MVP+S/M, just add the option --lightweight_tuning=True in the script.

For example, to lightweight tune roc dataset using MVP+M:

python run_textbox.py --model=MVP --dataset=roc --model_path=RUCAIBox/mvp-multi-task --lightweight_tuning=True

We also support to lightweight tune with BART+R (i.e., Prefix-tuning) here.

Citation

@article{tang2022mvp,
  title={MVP: Multi-task Supervised Pre-training for Natural Language Generation},
  author={Tang, Tianyi and Li, Junyi and Zhao, Wayne Xin and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2206.12131},
  year={2022},
  url={https://arxiv.org/abs/2206.12131},
}