Open kongds opened 1 year ago
Hello, I‘ve met the same problem, but I could not get the right results for W1A1 (around 52 accuracy on RTE), and when I try to train W1A2, the result is worse (50%). May I ask if you tried to reproduce RTE?
I don't run RTE. But i have reproduced STS-B. The result of W1A1 is around 67.0 compared to 71.1 in paper.
The results of STS-B are 67.7(W1A1 w/o multi-distill), 73.5(W1A2), and 58.0(W1A1 W multi-distill), still lower than the paper. I didn't use data parallel. ------------------ Original ------------------ From: @.>; Date: Wed, Nov 30, 2022 10:59 PM To: @.>; Cc: @.>; @.>; Subject: Re: [facebookresearch/bit] Problem in reproduce multi-distillation approach (Issue #2)
I don't run RTE. But i have reproduced STS-B. The result of W1A1 is around 66.6 compared to 71.1 in paper.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
It seems that we cannot reproduce the result of STS-B. The settings of STS-B are: https://github.com/facebookresearch/bit/blob/071a9749e024e8e151c55adbeb6ef3aaf5b8a283/utils_glue.py#L689 According to paper, authors use grid searching to get the result of STS-B
Hello, I‘ve met the same problem, I also could not get the right results for W1A1 STS-B (around 68 compared to 71 reported in the paper). May I ask whether you have figured out the reason? @kongds
Hi, I still can't get the correct result for W1A1 STS-B and don't know why.
That is also difficult for me. I have also tried most W1A2 experiments (with a clear accuracy gap) and want to cite and compare BiT in my paper, but the accuracy gap now really confuses me.
I can not get the accuracy shown in the paper in most w1a2 or w1a4 tasks and the accuracy gap is about 10 points.
I can not get the accuracy shown in the paper in most w1a2 or w1a4 tasks and the accuracy gap is about 10 points.
Maybe the released version is not the optimal version.
I can reproduce the 1-1-1 BERT for all datasets without multi-distillation. But for 1-1-4 and 1-1-2 BERT, my results are way off. Is anyone @kongds @NicoNico6 @TTTTTTris @likethesky @Celebio getting the same thing?
I can reproduce the 1-1-1 BERT for all datasets without multi-distillation. But for 1-1-4 and 1-1-2 BERT, my results are way off. Is anyone @kongds @NicoNico6 @TTTTTTris @likethesky @Celebio getting the same thing?
Hi, I also found this problem.
Besides, I tried to evaluate the released pre-trained model, but I can not get ACC reported in the README Table. For example, when data augmentation is used, the reported ACC of the released pretrained model is RTE:69.7, MRPC: 88, STS-B: 84.2.
However, I tried running an evaluation based on the released by myself, the corresponding performance is RTE: 66 vs 69.7, MRPC: 85.5 vs 88, STS-B 82.3 vs 84.2.
Did you find the same issue?
I can reproduce the 1-1-1 BERT for all datasets without multi-distillation. But for 1-1-4 and 1-1-2 BERT, my results are way off. Is anyone @kongds @NicoNico6 @TTTTTTris @likethesky @Celebio getting the same thing?
Hi, I also found this problem.
Besides, I tried to evaluate the released pre-trained model, but I can not get ACC reported in the README Table. For example, when data augmentation is used, the reported ACC of the released pretrained model is RTE:69.7, MRPC: 88, STS-B: 84.2.
However, I tried running an evaluation based on the released by myself, the corresponding performance is RTE: 66 vs 69.7, MRPC: 85.5 vs 88, STS-B 82.3 vs 84.2.
Did you find the same issue?
Have you tried doing grid search for hyper parameters and see if it works?
Hello, Thank you for providing code.
I can get the right results of W1A1 with
bash scripts/run_glue.sh MNLI
(around 77 accuracy on MNLI)But when i reproduce the W1A1 with multi-distillation approach following (W32A32->W1A2->W1A1), I cannot reproduce the results of W1A2 in paper by simply change
abits=1
toabits=2
inscripts/run_glue.sh
(The result of W1A2 i get is80.96/81.36
).Can you share the detail settings of multi-disitillation approach?