About the metrics. - Githubissues

benjpau commented 2 years ago

Does the C-score metric in the paper represent the entail_score in the external_metrics_func function?

Also, is the model used to calculate the entail_score the same as the model used for Persona distillation?

Could you share your model for calculating the C-score?

Furthermore, I did not find the BSf metric in the results, how can I calculate this?

Thanks a lot!

caoyu-noob commented 2 years ago

Yes. entail_score is exactly the C-score I used in the paper.
Yes. The model to calculate entail_score is the same as the model used for *Persona distillation** .
I trained the model for C-score based a RoBerta-large-MNLI model on the dataset from https://arxiv.org/pdf/1811.00671.pdf . Please give me some because I need to find out the checkpoint because it is an old project one year ago.
The script to calculate BERT score is under the path bert_score. Please follow https://github.com/Tiiiger/bert_score to make preparations (Download the model). Besides, in order to get the BERT score displayed in your experiment, you need to assign the path of BERT score model in the shell. It means you need to add an argument --bert_score_model_path $YOUR_PATH, or it will skip this procedure.

benjpau commented 2 years ago

Thanks for your reply!

For the BERT score, I ran the train_gpt2.sh, and specified the bert_score_model_path as roberta-large. However, the bert score F1 I obtained is 0.8603. It seems that this score is not rescaled. Could you tell me how to get the rescaled score? Thanks a lot!

caoyu-noob commented 2 years ago

Hi, I have uploaded the trained NLI model for distillation. You can find it in the updated README. I have also added the rescale baseline file for BERT score. You shall add --rescale_with_baseline in your shell to get the rescaled results.

caoyu-noob / D3

About the metrics. #4