About Synthetic training

awasthiabhijeet / PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)

MIT License

227 stars 40 forks source link

About Synthetic training #7

Closed Serenade-J closed 4 years ago

Serenade-J commented 4 years ago

Hi, I am working on PIE-BERT-Base, when I use the released synthetic data ( actually, I chose a2 ), I performed 2 epochs of training on a2 and fine-tune on the Lang8+fce+nucle for 2 epochs, using PIE-Base. I found the synthetic-training didn't improve the model. During the synthetic training stage, I observed that the loss function started to fall and then jump between about 10~14 until the 2 epoch training finished. I use the pickles you mentioned in another issue during both synthetic training and fine-tune. Hyperparameters were copied from Appendix(A.5) for PIE-Base.

I wonder if you have experiment results showing synthetic training boosts the PIE-Base, and by how many F-0.5 scores? In the ablation study part of the paper, there is only one result about PIE-Base(56.6). I want to know how many F-0.5 scores is this result be boosted by synthetic training. And whether you have observed the same phenomenon about loss values during synthetic training?
Is there anything else to note about synthetic training?

Thank u so much

awasthiabhijeet commented 4 years ago

I found the synthetic-training didn't improve the model.

Do you mean, no improvement as compared to directly finetuning PIE on Lang8+fce+nucle for 2 epochs?

I wonder if you have experiment results showing synthetic training boosts the PIE-Base, and by how many F-0.5 scores?

We only tried synthetic training with PIE-large and improvements are reported in the paper. Score corresponding to PIE-base (56.6) is without synthetic training.

And whether you have observed the same phenomenon about loss values during synthetic training?

Sorry, I did not check if something like this is happening.

Is there anything else to note about synthetic training?

The focus of our work is not on synthetic training. We simply used it as a heuristic to gain slight boosts on performance. I would like to refer to an EMNLP 2019 paper -- "An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction". This could be a better way to generate synthetic data for GEC.

Thanks

Serenade-J commented 4 years ago

will the learning rate be reset to 2e-5 when finetune start?

Serenade-J commented 4 years ago

@awasthiabhijeet

awasthiabhijeet commented 4 years ago

will the learning rate be reset to 2e-5 when finetune start?

Sorry for the delayed reply. Yes after saving the checkpoint obtained by training PIE on synthetic GEC data, learning rate (and the optimizer) is reset when finetuning PIE further on the GEC corpus. There was no specific reason behind this choice. I did not explore resuming fine tuning with the learning rate obtained after end of synthetic training.