Add distilBERT with batches and test it

MastafaF commented 4 years ago

Test of distilBERT with batch and GPU support: Input:

! sh similarity_distilBERT_batch.sh 40 cls True

Output: Confusion matrix:

langs   cs       de       en       es       fr       avg     
cs     0.00%   92.64%   98.43%   92.57%   94.44%   94.52%
de    77.66%    0.00%   91.08%   76.99%   80.25%   81.49%
en    83.82%   71.46%    0.00%   54.08%   61.44%   67.70%
es    79.65%   82.88%   88.68%    0.00%   69.70%   80.23%
fr    79.39%   82.32%   87.01%   60.81%    0.00%   77.38%
avg   80.13%   82.33%   91.30%   71.11%   76.46%   80.26%

Previously, we had:

Input:

! sh similarity_distilBERT.sh 40 cls

Output:

langs   de       en       es       fr       ru       avg     
de     0.00%   91.08%   76.99%   80.25%   84.75%   83.27%
en    71.46%    0.00%   54.08%   61.44%   67.83%   63.70%
es    82.88%   88.68%    0.00%   69.70%   78.75%   80.00%
fr    82.32%   87.01%   60.81%    0.00%   78.95%   77.27%
ru    86.15%   91.91%   76.26%   80.82%    0.00%   83.78%
avg   80.70%   89.67%   67.03%   73.05%   77.57%   77.61%

MastafaF commented 4 years ago

Looking at pairs like ('en', 'es'), we can see that we yield the same results when using batch and GPU than when using one sentence at a time with CPU.

MastafaF commented 4 years ago

Input

! sh similarity_distilBERT_batch.sh 100 cls True

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   89.84%   98.07%   89.88%   91.94%   92.43%
de    72.86%    0.00%   88.68%   71.33%   73.63%   76.62%
en    81.55%   64.47%    0.00%   42.99%   51.85%   60.21%
es    77.46%   78.82%   85.55%    0.00%   60.11%   75.48%
fr    75.42%   76.76%   83.82%   50.18%    0.00%   71.55%
avg   76.82%   77.47%   89.03%   63.59%   69.38%   75.26%

Input:

! sh similarity_distilBERT_batch.sh 40 cls True

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   92.64%   98.43%   92.57%   94.44%   94.52%
de    77.66%    0.00%   91.08%   76.99%   80.25%   81.49%
en    83.82%   71.46%    0.00%   54.08%   61.44%   67.70%
es    79.65%   82.88%   88.68%    0.00%   69.70%   80.23%
fr    79.39%   82.32%   87.01%   60.81%    0.00%   77.38%
avg   80.13%   82.33%   91.30%   71.11%   76.46%   80.26%

MastafaF / multilingual_similarity_compare

Add distilBERT with batches and test it #4