ChangeIsKey / LSCDBenchmark

7 stars 1 forks source link

WiC Unit and Integration Test #45

Open arshandalili opened 4 months ago

arshandalili commented 4 months ago

Complete the Unit and Integration Tests for the WiC task.

arshandalili commented 3 months ago
arshandalili commented 3 months ago

Adding the below results as integration tests:

Also, based on the paper:

Screenshot 2024-07-01 at 12 35 17 PM
arshandalili commented 3 months ago

Problems with dataset loading: It happens when we first load a dataset on the test_on mode.

arshandalili commented 3 months ago

Possible problems with benchmark:

arshandalili commented 3 months ago
nvanva commented 3 weeks ago

@arshandalili Arshan, please describe what is exactly done regarding this issue

arshandalili commented 2 weeks ago

XL-Lexeme en results has been reproduced using this config:

Screenshot 2024-09-28 at 9 41 26 PM

Result:

Screenshot 2024-09-28 at 10 10 04 PM

nvanva commented 2 weeks ago

@arshandalili Two questions: 1) Is this config on your screenshot also committed to the repo, could you please give a link? 2) When talking about the results being "reproduced" we usually mean that we have a script/command that gets the same results on the same dataset as some published or previously known results. I see you got Spearman's correlation 0.623 on dwug_en_200. Which exactly results this reproduce?

I see above the following results we wanted to reproduce: "XL-Lexeme-Cosine dwug_en_median NA 0.598 0.0", what is the difference between dwug_en_median and dwug_en_200?

arshandalili commented 2 weeks ago

For XL-Lexeme, the difference in the results may be due to data, but the goal was to ensure that the model works reasonably, i.e., it doesn't contain bugs; otherwise, there would be a huge difference.

The only current way for config is to run the config in a command. Refer to README.

Also, refer to test_wic.py.