Open pribadihcr opened 1 year ago
you need to change the path of SOURCE and TARGET according to your dataset. .src and .trg are files which contains lines of text as original docs and its corrected version. Using the script, you can get a file with edits, and that is the one to train the model.
Hi, How to get the data as mention in the prepare data script SUBSET="train-stage2" SOURCE="../gec_private_train_data/${SUBSET}.src" TARGET="../gec_private_train_data/${SUBSET}.trg" OUTPUT="../gec_private_train_data/${SUBSET}.edits"