in a supervised setting， why are the results of the test set tested during training,

YJiangcm / PromCSE

Code for "Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning (EMNLP 2022)"

https://arxiv.org/abs/2203.06875v2

134 stars 16 forks source link

in a supervised setting， why are the results of the test set tested during training, #2

Closed Gyyyym closed 1 year ago

Gyyyym commented 2 years ago

Hello, in a supervised setting, why are the results of the test set tested during training, so that the results of the test set will not be high? I don't quite understand this part, please advise, thanks.

YJiangcm commented 2 years ago

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

Gyyyym commented 2 years ago

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

I use the dataset of another task. The accuracy rate on the training set, validation set and test set can reach 100%, and the loss value is above 4.0. I feel this result is very strange. I see that there is a 5-fold cross in the code. is it also trained on the test set, resulting in a very high accuracy rate? and Why is 5-fold cross-validation used in the evaluation process? Thank you very much for your reply

YJiangcm commented 2 years ago

Why is 5-fold cross-validation is used: After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0": I haven't come across such a phenomenon yet. Which dataset did you use?

Gyyyym commented 2 years ago

Why is 5-fold cross-validation is used: After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0": I haven't come across such a phenomenon yet. Which dataset did you use?

I use CMV datasets. I don't know why the loss value doesn't drop and the accuracy improves still.