Why is mean pooling giving such bad performance?

MastafaF commented 4 years ago

! sh similarity_distilBERT_batch.sh 50 mean True

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   99.97%   99.97%   99.97%   99.97%   99.97%
de   100.00%    0.00%  100.00%  100.00%  100.00%  100.00%
en    99.97%   99.97%    0.00%   99.97%   99.97%   99.97%
es    99.97%   99.97%   99.93%    0.00%  100.00%   99.97%
fr    99.97%   99.97%   99.97%   99.97%    0.00%   99.97%
avg   99.98%   99.97%   99.97%   99.98%   99.98%   99.97%

MastafaF commented 4 years ago

Solved! Issue in the sum() when multiplying the attention_mask with the input_embedding and averaging over the true tokens only (not the paddings)

MastafaF commented 4 years ago

Output now that issue is solved with same input:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   84.18%   85.98%   82.65%   89.08%   85.47%
de    74.79%    0.00%   61.51%   65.23%   67.27%   67.20%
en    75.72%   55.04%    0.00%   26.71%   31.60%   47.27%
es    78.39%   81.75%   37.40%    0.00%   55.54%   63.27%
fr    75.36%   63.20%   35.16%   34.33%    0.00%   52.01%
avg   76.07%   71.05%   55.01%   52.23%   60.87%   63.05%

Input:

sh similarity_distilBERT_batch.sh 100 mean True

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   84.85%   87.71%   86.85%   88.71%   87.03%
de    75.06%    0.00%   64.20%   70.36%   69.13%   69.69%
en    74.16%   54.28%    0.00%   24.84%   29.14%   45.60%
es    75.99%   81.62%   40.86%    0.00%   50.95%   62.35%
fr    90.08%   78.16%   37.53%   56.88%    0.00%   65.66%
avg   78.82%   74.73%   57.58%   59.73%   59.48%   66.07%

MastafaF / multilingual_similarity_compare

Why is mean pooling giving such bad performance? #6