A faster and simpler implementation of GECToR – Grammatical Error Correction: Tag, Not Rewrite with amp and distributed support by deepspeed. To make it faster and more readable, we remove allennlp dependencies and reconstruct related codes.
NOTE: the project is now maintained by cofe-ai, updates and issue fixes will be on https://github.com/cofe-ai/fast-gector . Please check it.
Install Pytorch with cuda support
conda create -n gector_env python=3.7.6 -y
conda activate gector_env
conda install pytorch=1.10.1 cudatoolkit -c pytorch
Install NVIDIA-Apex (for using amp with deepspeed)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Install following packages by conda/pip
python==3.7.6
transformers==4.14.1
scikit-learn==1.0.2
numpy==1.21.2
deepspeed==0.5.10
Tokenize your data (one sentence per line, split words by space)
Generate edits from parallel sents
python utils/preprocess_data.py -s source_file -t target_file -o output_edit_file
*(Optional) Define your own target vocab (data/vocabulary/labels.txt)
bash scripts/train.sh
bash scripts/predict.sh
[1] Omelianchuk, K., Atrasevych, V., Chernodub, A., & Skurzhanskyi, O. (2020). GECToR -- Grammatical Error Correction: Tag, Not Rewrite. arXiv:2005.12592 [cs]. http://arxiv.org/abs/2005.12592