Open Garrafao opened 5 months ago
The TempoWiC results indicate that something is wrong with the data transformation. For one, the logic of label 0 and label 1 could be twisted. Also, target word indices could be wrong. It is not urgent as results for the rest of data sets are good. But, at some point we should validate the TempoWiC transformation. @shafqatvirk Any thoughts?
DWUG_DE accuracy correlation p-value 0.778 0.516 0.0
DWUG_EN accuracy correlation p-value 0.756 0.499 0.0
DWUG_SV accuracy correlation p-value 0.764 0.447 0.0
TempoWic_Train accuracy correlation p-value 0.513 -0.125 2.1678358774887747e-06
TempoWic_Trial accuracy correlation p-value 0.5 -0.272 0.2456957956063111
TempoWic_Validation accuracy correlation p-value 0.51 -0.118 0.018454072074192394
Wic_Dev accuracy correlation p-value 0.887 0.795 4.430455446790286e-140
Wic_Test accuracy correlation p-value 0.662 0.376 4.029358404835062e-48
Wic_Train accuracy correlation p-value 0.889 0.798 0.0
testgug_en accuracy correlation p-value 0.898 0.794 4.253365504873996e-214
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_de_median NA 0.601 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_en_median NA 0.583 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_sv_median NA 0.564 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_train NA -0.246 3.572459678427643e-21
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_trial NA -0.332 0.1521878698806116
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_validation NA -0.188 0.00017235823158717802
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold testwug_en_transformed_median NA 0.825 6.220291975603303e-245
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_dev NA 0.909 3.082802754404488e-243
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_test NA 0.414 5.630338320254875e-59
Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_train NA 0.915 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_de NA 0.61 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_de_median NA 0.61 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_en NA 0.598 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_en_median NA 0.598 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_sv NA 0.573 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_sv_median NA 0.573 0.0
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_train NA -0.31 4.32970849159845e-33
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_trial NA -0.389 0.08968881889529948
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_validation NA -0.205 4.092545817880166e-05
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine testwug_en_transformed_median NA 0.814 3.323558053916412e-233
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine testwug_en_transformed_binarize-median NA 0.802 1.1355171676745905e-221
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_dev NA 0.847 6.3474350881542155e-177
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_test NA 0.493 1.6089625897990044e-86
Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_train NA 0.857 0.0
_Originally posted by @shafqatvirk in https://github.com/Garrafao/durel_system_annotators/issues/30#issuecomment-2003178495_