Garrafao / durel_system_annotators

3 stars 0 forks source link

Check TempoWiC data transformation #49

Open Garrafao opened 5 months ago

Garrafao commented 5 months ago
          Latest scores after re-factoring: 

DWUG_DE accuracy correlation p-value 0.778 0.516 0.0

DWUG_EN accuracy correlation p-value 0.756 0.499 0.0

DWUG_SV accuracy correlation p-value 0.764 0.447 0.0

TempoWic_Train accuracy correlation p-value 0.513 -0.125 2.1678358774887747e-06

TempoWic_Trial accuracy correlation p-value 0.5 -0.272 0.2456957956063111

TempoWic_Validation accuracy correlation p-value 0.51 -0.118 0.018454072074192394

Wic_Dev accuracy correlation p-value 0.887 0.795 4.430455446790286e-140

Wic_Test accuracy correlation p-value 0.662 0.376 4.029358404835062e-48

Wic_Train accuracy correlation p-value 0.889 0.798 0.0

testgug_en accuracy correlation p-value 0.898 0.794 4.253365504873996e-214

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_de_median NA 0.601 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_en_median NA 0.583 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold dwug_sv_median NA 0.564 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_train NA -0.246 3.572459678427643e-21

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_trial NA -0.332 0.1521878698806116

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold tempowic_validation NA -0.188 0.00017235823158717802

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold testwug_en_transformed_median NA 0.825 6.220291975603303e-245

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_dev NA 0.909 3.082802754404488e-243

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_test NA 0.414 5.630338320254875e-59

Annotator Data accuracy correlation p-value XL-Lexeme-Multi-Threshold wic_train NA 0.915 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_de NA 0.61 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_de_median NA 0.61 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_en NA 0.598 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_en_median NA 0.598 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_sv NA 0.573 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine dwug_sv_median NA 0.573 0.0

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_train NA -0.31 4.32970849159845e-33

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_trial NA -0.389 0.08968881889529948

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine tempowic_validation NA -0.205 4.092545817880166e-05

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine testwug_en_transformed_median NA 0.814 3.323558053916412e-233

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine testwug_en_transformed_binarize-median NA 0.802 1.1355171676745905e-221

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_dev NA 0.847 6.3474350881542155e-177

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_test NA 0.493 1.6089625897990044e-86

Annotator Data accuracy correlation p-value XL-Lexeme-Cosine wic_train NA 0.857 0.0

_Originally posted by @shafqatvirk in https://github.com/Garrafao/durel_system_annotators/issues/30#issuecomment-2003178495_

Garrafao commented 5 months ago

The TempoWiC results indicate that something is wrong with the data transformation. For one, the logic of label 0 and label 1 could be twisted. Also, target word indices could be wrong. It is not urgent as results for the rest of data sets are good. But, at some point we should validate the TempoWiC transformation. @shafqatvirk Any thoughts?