Closed Rexhaif closed 1 year ago
Hi @Rexhaif I think your problem is on the models you are using. Have you tried these ones?
eamt22-cometinho-da
eamt22-prune-comet-da
The confusion is that the "first" versions of cometinho were actually trained for WMT 21. They are not distilled versions of larger COMET models but rather smaller encoders trained with the same data. Then for EAMT Conference we experimented with distillation of larger models (which allowed us to train the smaller encoder in much more data) those are the results presented in COMETINHO paper.
Sorry about the confusing names. I hope with this you are able to reproduce the results
Hi, thanks for the clarification and the model names!
Are there any distilled/pruned COMETINHO models available that were trained on the MQM scores rather than the DA scores?
Unfortunately no.
🐛 Bug
Cometinho paper states that pruned model achieves 0.274 Kendall tau score on news subsection of WMT21 EN-RU MQM dataset. Their distilled model achieves 0.263 Kendall tau score. However, simple reproduction script using wmt21-cometinho-mqm models fails to show similar results on the same data.
To Reproduce
cometinho-mqm model from https://github.com/Unbabel/COMET/blob/master/MODELS.md
wget https://unbabel-experimental-models.s3.amazonaws.com/comet/wmt21/wmt21-cometinho-mqm.tar.gz tar xzf wmt21-cometinho-mqm.tar.gz
Expected behaviour
Provided script should print
Kendall tau: 0.263
or a close number.Environment
OS: Ubuntu Linux 20.04 Kernel 5.17.5 inside docker container Hardware: Nvidia RTX 3090 Packaging: pip Version: 2.0.0
Additional Comments
torch.set_float32_matmul_precision("medium")
impacts only GPU utilization & processing speed, scores are the same