Wikidata / soweego

Link Wikidata items to large catalogs
GNU General Public License v3.0
97 stars 9 forks source link

Reevaluate performance of base classifiers #351

Closed tupini07 closed 5 years ago

tupini07 commented 5 years ago

Reevaluate the performance of base classifiers after finding their optimal hyperparameters via grid search #361

NOTE: Add graph and summary for classifications

tupini07 commented 5 years ago

For Linking

Continuous classifiers


Binary classifiers


Catalog Entity Model Count Mean STD Min 25% 50% 75% Max
discogs band linear_support_vector_machines 92137 0.064480 0.245607 0.000000 0.000000 0.000000 0.000000 1.000000
discogs musician linear_support_vector_machines 515082 0.032108 0.176286 0.000000 0.000000 0.000000 0.000000 1.000000
imdb writer linear_support_vector_machines 68702 0.053259 0.224551 0.000000 0.000000 0.000000 0.000000 1.000000
imdb musician linear_support_vector_machines 944196 0.018987 0.136477 0.000000 0.000000 0.000000 0.000000 1.000000
imdb director linear_support_vector_machines 35094 0.083775 0.277054 0.000000 0.000000 0.000000 0.000000 1.000000
imdb producer linear_support_vector_machines 9351 0.094001 0.291845 0.000000 0.000000 0.000000 0.000000 1.000000
imdb actor linear_support_vector_machines 355515 0.085555 0.279706 0.000000 0.000000 0.000000 0.000000 1.000000
musicbrainz musician linear_support_vector_machines 652175 0.029915 0.170354 0.000000 0.000000 0.000000 0.000000 1.000000
musicbrainz band linear_support_vector_machines 106026 0.055864 0.229659 0.000000 0.000000 0.000000 0.000000 1.000000
discogs musician logistic_regression 515082 0.026798 0.125459 0.000000 0.000681 0.001240 0.002344 0.999997
discogs band logistic_regression 92137 0.056130 0.165189 0.000000 0.004231 0.006317 0.010508 0.999984
imdb director logistic_regression 35094 0.066687 0.194344 0.000132 0.001263 0.002436 0.006390 0.999955
imdb producer logistic_regression 9351 0.076076 0.200209 0.000220 0.001204 0.002088 0.004567 0.999668
imdb actor logistic_regression 355515 0.068824 0.188005 0.000236 0.002456 0.003915 0.006918 0.999943
imdb musician logistic_regression 944196 0.019788 0.089113 0.000075 0.000567 0.000977 0.001920 0.999975
imdb writer logistic_regression 68702 0.044311 0.157563 0.000114 0.000965 0.001704 0.003226 0.999800
musicbrainz band logistic_regression 106026 0.077057 0.218408 0.000000 0.001852 0.002936 0.006318 1.000000
musicbrainz musician logistic_regression 652175 0.028386 0.126628 0.000000 0.001425 0.002688 0.004012 1.000000
discogs band multi_layer_perceptron 92137 0.947446 0.169248 0.000000 0.994668 0.999367 0.999794 0.999977
discogs musician multi_layer_perceptron 515082 0.974131 0.125070 0.000000 0.998604 0.999678 0.999942 0.999999
imdb director multi_layer_perceptron 70188 0.934691 0.192393 0.002369 0.996334 0.999077 0.999737 0.999995
imdb writer multi_layer_perceptron 137404 0.954458 0.160987 0.004722 0.997428 0.999359 0.999802 0.999995
imdb producer multi_layer_perceptron 18702 0.922224 0.204272 0.000604 0.996273 0.999268 0.999751 0.999984
imdb actor multi_layer_perceptron 355515 0.932479 0.185652 0.002369 0.996054 0.998967 0.999477 0.999999
imdb musician multi_layer_perceptron 944196 0.982765 0.081712 0.001360 0.999014 0.999708 0.999881 0.999997
musicbrainz musician multi_layer_perceptron 652175 0.766345 0.411756 0.000000 0.820425 0.999451 0.999733 1.000000
musicbrainz band multi_layer_perceptron 106026 0.785790 0.377086 0.000000 0.709337 0.999467 0.999908 1.000000
discogs musician naive_bayes 515082 0.034216 0.161297 0.000000 0.000319 0.000319 0.005422 0.999999
discogs band naive_bayes 92137 0.069714 0.205292 0.000000 0.001541 0.013942 0.013942 0.999995
imdb musician naive_bayes 944196 0.025071 0.111446 0.000000 0.000319 0.001574 0.001574 0.999999
imdb writer naive_bayes 68702 0.043241 0.152733 0.000000 0.000355 0.001883 0.001883 0.999999
imdb actor naive_bayes 355515 0.070022 0.192806 0.000000 0.000700 0.004079 0.004079 0.999999
imdb producer naive_bayes 9351 0.070559 0.184460 0.000000 0.000417 0.001824 0.001824 0.999993
imdb director naive_bayes 35094 0.060302 0.180137 0.000000 0.000438 0.002258 0.002790 0.999998
musicbrainz band naive_bayes 106026 0.313230 0.366327 0.000000 0.000063 0.006630 0.673986 1.000000
musicbrainz musician naive_bayes 652175 0.013899 0.101066 0.000000 0.000012 0.000096 0.000096 1.000000
discogs band random_forest 92137 0.053361 0.180183 0.000000 0.000000 0.000000 0.000000 1.000000
discogs musician random_forest 515082 0.024329 0.127647 0.000000 0.000000 0.000000 0.000000 1.000000
imdb producer random_forest 9351 0.081674 0.221939 0.000000 0.000000 0.000000 0.000000 1.000000
imdb writer random_forest 68702 0.045897 0.167953 0.000000 0.000000 0.000000 0.000000 1.000000
imdb director random_forest 35094 0.067178 0.204324 0.000000 0.000000 0.000000 0.000000 1.000000
imdb musician random_forest 944196 0.018942 0.096642 0.000000 0.000000 0.000000 0.000000 1.000000
imdb actor random_forest 355515 0.070115 0.202583 0.000000 0.000000 0.000000 0.000000 1.000000
musicbrainz musician random_forest 652175 0.025781 0.128633 0.000000 0.000000 0.000000 0.000000 1.000000
musicbrainz band random_forest 106026 0.090595 0.229891 0.000000 0.000000 0.000000 0.004000 1.000000
discogs musician single_layer_perceptron 515082 0.973132 0.125103 0.000000 0.997566 0.998666 0.999229 0.999829
discogs band single_layer_perceptron 92137 0.943075 0.165641 0.000000 0.989174 0.993207 0.995274 0.998551
imdb writer single_layer_perceptron 68702 0.955303 0.161966 0.000147 0.997567 0.998731 0.999296 0.999923
imdb actor single_layer_perceptron 355515 0.930660 0.189850 0.000052 0.993078 0.996077 0.997540 0.999728
imdb director single_layer_perceptron 35094 0.931354 0.200875 0.000025 0.994105 0.997881 0.998906 0.999895
imdb musician single_layer_perceptron 944196 0.980424 0.089165 0.000026 0.998115 0.999061 0.999472 0.999938
imdb producer single_layer_perceptron 9351 0.922776 0.206773 0.000165 0.996212 0.998406 0.999139 0.999876
musicbrainz musician single_layer_perceptron 652175 0.763318 0.411444 0.000000 0.795643 0.996459 0.997635 1.000000
musicbrainz band single_layer_perceptron 106026 0.805367 0.372603 0.000000 0.913607 0.996555 0.998851 1.000000

Below is a summary of the averaged values for each kind of ensemble

Model Average Mean Average STD Average 25% Average 50% Average 75% Average Max
single_layer_perceptron 0.911712 0.213713 0.963896 0.997227 0.998371 0.999749
multi_layer_perceptron 0.911148 0.212020 0.945349 0.999371 0.999781 0.999994
naive_bayes 0.077806 0.183952 0.000463 0.003623 0.078400 0.999998
linear_support_vector_machines 0.057549 0.225727 0.000000 0.000000 0.000000 1.000000
random_forest 0.053097 0.173311 0.000000 0.000000 0.000444 1.000000
logistic_regression 0.051562 0.162769 0.001627 0.002700 0.005134 0.999925
tupini07 commented 5 years ago

For Evaluation

Catalog Entity Model F1.Mean F1.STD Prec.Mean Prec.STD Recall.Mean Recall.STD
discogs band linear_support_vector_machines 0.925262 0.001267 0.896373 0.002110 0.956090 0.003388
discogs musician linear_support_vector_machines 0.936944 0.001224 0.905424 0.002080 0.970741 0.000880
imdb writer linear_support_vector_machines 0.928606 0.001983 0.919005 0.002274 0.938421 0.003574
imdb actor linear_support_vector_machines 0.894011 0.000789 0.875260 0.002088 0.913592 0.001998
imdb director linear_support_vector_machines 0.912727 0.001597 0.880551 0.005850 0.947397 0.003670
imdb producer linear_support_vector_machines 0.905975 0.003885 0.894173 0.006799 0.918114 0.001219
imdb musician linear_support_vector_machines 0.917906 0.001399 0.909440 0.002345 0.926534 0.001067
musicbrainz band linear_support_vector_machines 0.913877 0.001689 0.961199 0.003976 0.871046 0.006050
musicbrainz musician linear_support_vector_machines 0.954410 0.001406 0.942882 0.002112 0.966225 0.001090
discogs musician logistic_regression 0.938138 0.001693 0.917022 0.002191 0.960250 0.001370
discogs band logistic_regression 0.924791 0.001550 0.908952 0.001476 0.941198 0.003025
imdb director logistic_regression 0.913846 0.000694 0.892154 0.004468 0.936658 0.003834
imdb producer logistic_regression 0.906045 0.002643 0.898064 0.005640 0.914192 0.000826
imdb actor logistic_regression 0.893573 0.000579 0.878388 0.001838 0.909299 0.001406
imdb musician logistic_regression 0.917958 0.001717 0.908442 0.002911 0.927680 0.001280
imdb writer logistic_regression 0.928108 0.002082 0.924621 0.001792 0.931635 0.004264
musicbrainz band logistic_regression 0.918022 0.002327 0.937643 0.002251 0.899217 0.003973
musicbrainz musician logistic_regression 0.953083 0.001301 0.943755 0.002042 0.962600 0.001165
discogs band multi_layer_perceptron 0.928931 0.001310 0.915249 0.001875 0.943032 0.001727
discogs musician multi_layer_perceptron 0.940715 0.001443 0.921507 0.001770 0.960754 0.003867
imdb director multi_layer_perceptron 0.919340 0.000980 0.899994 0.002724 0.939548 0.002002
imdb actor multi_layer_perceptron 0.896922 0.000400 0.880657 0.003412 0.913832 0.004129
imdb producer multi_layer_perceptron 0.909398 0.002913 0.894253 0.007249 0.925119 0.003188
imdb musician multi_layer_perceptron 0.925888 0.001165 0.942860 0.001545 0.909520 0.001863
imdb writer multi_layer_perceptron 0.930302 0.002571 0.924571 0.003714 0.936149 0.006365
musicbrainz band multi_layer_perceptron 0.921134 0.002588 0.926546 0.004402 0.915818 0.004989
musicbrainz musician multi_layer_perceptron 0.957375 0.001629 0.944146 0.004491 0.971004 0.002647
discogs musician naive_bayes 0.931622 0.001314 0.903489 0.002155 0.961565 0.000812
discogs band naive_bayes 0.922386 0.001457 0.891462 0.001633 0.955542 0.003274
imdb musician naive_bayes 0.904327 0.013959 0.868388 0.055110 0.948158 0.033475
imdb producer naive_bayes 0.898657 0.004254 0.842785 0.008230 0.962525 0.001450
imdb director naive_bayes 0.907755 0.002541 0.855312 0.005063 0.967075 0.001997
imdb writer naive_bayes 0.922121 0.002580 0.882778 0.004051 0.965145 0.002172
imdb actor naive_bayes 0.889563 0.001311 0.825956 0.002002 0.963789 0.001153
musicbrainz musician naive_bayes 0.952783 0.001847 0.953561 0.002394 0.952008 0.001847
musicbrainz band naive_bayes 0.907094 0.001994 0.953012 0.001335 0.865401 0.002974
discogs band random_forest 0.921673 0.001084 0.909216 0.002648 0.934490 0.002988
discogs musician random_forest 0.933460 0.002427 0.917496 0.003021 0.949992 0.002296
imdb musician random_forest 0.922135 0.000594 0.932216 0.000733 0.912272 0.001018
imdb producer random_forest 0.906239 0.004720 0.895270 0.008984 0.917553 0.005214
imdb director random_forest 0.916903 0.000940 0.892806 0.003096 0.942355 0.002601
imdb writer random_forest 0.929301 0.002919 0.922216 0.004160 0.936508 0.003393
imdb actor random_forest 0.899161 0.000836 0.879621 0.001427 0.919592 0.001620
musicbrainz band random_forest 0.918844 0.002231 0.938321 0.005419 0.900192 0.003714
musicbrainz musician random_forest 0.953673 0.001146 0.943972 0.001591 0.963577 0.001052
discogs band single_layer_perceptron 0.925647 0.001462 0.908264 0.001465 0.943716 0.003188
discogs musician single_layer_perceptron 0.938168 0.001664 0.917755 0.002291 0.959510 0.001258
imdb actor single_layer_perceptron 0.893205 0.000591 0.881499 0.001829 0.905232 0.001676
imdb musician single_layer_perceptron 0.917503 0.001383 0.909845 0.002659 0.925296 0.000865
imdb director single_layer_perceptron 0.914070 0.000545 0.895414 0.003822 0.933551 0.003516
imdb producer single_layer_perceptron 0.905526 0.002380 0.903146 0.006456 0.907958 0.001993
imdb writer single_layer_perceptron 0.927724 0.001915 0.924983 0.002891 0.930499 0.003970
musicbrainz musician single_layer_perceptron 0.953054 0.001383 0.943662 0.002094 0.962638 0.001341
musicbrainz band single_layer_perceptron 0.915977 0.002302 0.946439 0.002538 0.887441 0.005184

Average results

Averaging the scores obtained for each classifier we get the following table (sorted from best F1 to worst):

Model Average F1 Average F1.STD Average Prec Average Prec.STD Average Recall Average Recall.STD
multi_layer_perceptron 0.925556 0.001667 0.916643 0.003465 0.934975 0.003420
random_forest 0.922377 0.001877 0.914570 0.003453 0.930726 0.002655
logistic_regression 0.921507 0.001621 0.912116 0.002734 0.931414 0.002349
single_layer_perceptron 0.921208 0.001514 0.914556 0.002894 0.928427 0.002555
linear_support_vector_machines 0.921080 0.001693 0.909367 0.003293 0.934240 0.002548
naive_bayes 0.915145 0.003473 0.886305 0.009108 0.949023 0.005462


F1 bar plot


Precision bar plot


Recall bar plot


Precision vs recall for each model


tupini07 commented 5 years ago

This needs to be redone using the new wikidata dataset

tupini07 commented 5 years ago

Base performances updated!