awasthiabhijeet / PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)
MIT License
227 stars 40 forks source link

which source of correct sentences did you used to make the errorful sentences? #21

Closed shikha10799 closed 4 years ago

shikha10799 commented 4 years ago

Hi you mentioned in readme that in order to construct errorful sentences we need to specify the path to a correct file along with an output path.My question is " from which source did you extracted the correct sentences to form the erraneous dataset provided in the repository?" Since i also want to construct an erraneous dataset of preposition errors but first i need a correct dataset for that. Also Kindly provide your suggestions on how i can proceed in constructing a dataset with just preposition errors. Thanks in advance.

awasthiabhijeet commented 4 years ago

Hi @shikha10799 ,

As mentioned in our paper, we used One-billion-word corpus to create the artificial GEC corpus.

Also Kindly provide your suggestions on how i can proceed in constructing a dataset with just preposition errors.

If you already have a decent size GEC corpus, you can estimate the transition probabilities of the preposition errors. (E.g. Probability of Prep-1 being wrongly used in place of Prep-2 etc.), and introduce errors by using these estimates?