Hello and thank you very much for your contribution to the field and open-sourcing the code.
I am trying to reproduce the table 2 results from the paper using the code specified here. I had to add a value for -max_seq_length since the command wouldn't run otherwise. I also train for 40k steps instead of 20k steps, as is specified in the paper. Otherwise, I am running the exact same command.
The results I obtain are different from the ones shown in Table 2 in the paper. Here's what I obtain
Hello and thank you very much for your contribution to the field and open-sourcing the code.
I am trying to reproduce the table 2 results from the paper using the code specified here. I had to add a value for
-max_seq_length
since the command wouldn't run otherwise. I also train for 40k steps instead of 20k steps, as is specified in the paper. Otherwise, I am running the exact same command.The results I obtain are different from the ones shown in Table 2 in the paper. Here's what I obtain
Do you have any guidance as to what I might be doing wrong? Could it be that I'm not using the correct initial BLEURT checkpoint?
Thanks a lot