RandyZhouRan / MELM

Code for "MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER"
44 stars 6 forks source link

Do MELM training and NER model training share the same dev set? #10

Closed dr-GitHub-account closed 1 year ago

dr-GitHub-account commented 1 year ago

Thank you for sharing the code. I noticed that MELM training and NER model training both use a dev set. I am wondering if they use the same dev set. If the answer is yes, is it appropriate to use the augmented training set and the shared dev set to train and select the NER model while the dev set has already been a criterion of augmentation?

RandyZhouRan commented 1 year ago

Hi thanks for your interest in our paper. Yes both stages use the same source dev set. While the first stage uses a language modeling metric on the dev set for selecting the augmentation model, the second stage focuses on the task of NER and uses F1 as the metric. So I think there is no risk of 'overfitting' on the dev set. Hope this clarifies.

dr-GitHub-account commented 1 year ago

Thank you! I am wondering why MELM fine-tuning needs a dev set, given the fact that BERT pretraining (i.e., Masked Language Modeling and Next Sentence Prediction) does not require a dev set.

I noticed this difference: MELM fine-tuning uses a dataset containing entity labels while BERT pretraining does not. However, MELM is not trained to predict the entity labels. Both MELM fine-tuning and BERT pretraining train a model to do self-supervised cloze tests, in which dev set does not seem to be necessary.

RandyZhouRan commented 1 year ago

We could either use a dev set to do early stopping or just train for a fixed number of epochs. Both could work.

dr-GitHub-account commented 1 year ago

That makes sense. Thank you for your help!