NLP2CT / MultimodalGEC

4 stars 0 forks source link

MultimodalGEC

The multimodal GEC data creation for the paper titled Improving Grammatical Error Correction with Multimodal Feature Integration, which was accepted in the Findings of ACL 2023.

Citation

@inproceedings{fang-etal-2023-improving,
    title = "Improving Grammatical Error Correction with Multimodal Feature Integration",
    author = "Fang, Tao  and
      Hu, Jinpeng  and
      Wong, Derek F.  and
      Wan, Xiang  and
      Chao, Lidia S.  and
      Chang, Tsung-Hui",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.594",
    pages = "9328--9344",
}

Introduction

Overall framework of multimodal GEC model.

Grammatical error correction (GEC) is a promising task aimed at correcting errors in a text. Many methods have been proposed to facilitate this task with remarkable results. However, most of them only focus on enhancing textual feature extraction without exploring the usage of other modalities' information (e.g., speech), which can also provide valuable knowledge to help the model detect grammatical errors. To shore up this deficiency, we propose a novel framework that integrates both speech and text features to enhance GEC. In detail, we create new multimodal GEC datasets for English and German by generating audio from text using the advanced text-to-speech models. Subsequently, we extract acoustic and textual representations by a multimodal encoder that consists of a speech and a text encoder. A mixture-of-experts (MoE) layer is employed to selectively align representations from the two modalities, and then a dot attention mechanism is used to fuse them as final multimodal representations.

Multimodal Data Creation

English GEC Multimodal Data

Due to the large size of the English Clang8 data and the significant memory requirements for speech data, we are sharing the detailed methodology for generating English multimodal speech data.

German GEC Multimodal Data

Since the amount of German data is not substantial, we have released all the speech data directly, and you can download it from the provided link.

Train and Evaluation

English

   sh train_and_eval_English.sh

German

   sh train_and_eval_German.sh