Jason3900 / gector-fast

A faster, simpler and distributed implementation of GECToR, a seq2edit GEC model
Apache License 2.0
15 stars 2 forks source link

FastGECToR

Introduction

A faster and simpler implementation of GECToR – Grammatical Error Correction: Tag, Not Rewrite with amp and distributed support by deepspeed. To make it faster and more readable, we remove allennlp dependencies and reconstruct related codes.

NOTE: the project is now maintained by cofe-ai, updates and issue fixes will be on https://github.com/cofe-ai/fast-gector . Please check it.

Requirements

  1. Install Pytorch with cuda support

    conda create -n gector_env python=3.7.6 -y
    conda activate gector_env
    conda install pytorch=1.10.1 cudatoolkit -c pytorch
  2. Install NVIDIA-Apex (for using amp with deepspeed)

    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  3. Install following packages by conda/pip

    python==3.7.6
    transformers==4.14.1
    scikit-learn==1.0.2
    numpy==1.21.2
    deepspeed==0.5.10

Preprocess Data

  1. Tokenize your data (one sentence per line, split words by space)

  2. Generate edits from parallel sents

    python utils/preprocess_data.py -s source_file -t target_file -o output_edit_file
  3. *(Optional) Define your own target vocab (data/vocabulary/labels.txt)

Train Model

Inference

Reference

[1] Omelianchuk, K., Atrasevych, V., Chernodub, A., & Skurzhanskyi, O. (2020). GECToR -- Grammatical Error Correction: Tag, Not Rewrite. arXiv:2005.12592 [cs]. http://arxiv.org/abs/2005.12592