Open arshandalili opened 4 months ago
Adding the below results as integration tests:
Also, based on the paper:
Problems with dataset loading: It happens when we first load a dataset on the test_on
mode.
Possible problems with benchmark:
@arshandalili Arshan, please describe what is exactly done regarding this issue
XL-Lexeme en results has been reproduced using this config:
Result:
@arshandalili Two questions: 1) Is this config on your screenshot also committed to the repo, could you please give a link? 2) When talking about the results being "reproduced" we usually mean that we have a script/command that gets the same results on the same dataset as some published or previously known results. I see you got Spearman's correlation 0.623 on dwug_en_200. Which exactly results this reproduce?
I see above the following results we wanted to reproduce: "XL-Lexeme-Cosine dwug_en_median NA 0.598 0.0", what is the difference between dwug_en_median and dwug_en_200?
For XL-Lexeme, the difference in the results may be due to data, but the goal was to ensure that the model works reasonably, i.e., it doesn't contain bugs; otherwise, there would be a huge difference.
The only current way for config is to run the config in a command. Refer to README.
Also, refer to test_wic.py.
Complete the Unit and Integration Tests for the WiC task.