Wikidata / soweego

Link Wikidata items to large catalogs
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
GNU General Public License v3.0
97 stars 9 forks source link

Find optimal hyperparameters for current classifiers used in ensembles #361

Closed tupini07 closed 5 years ago

tupini07 commented 5 years ago

Use the hyper-parameter grid search functionality, provided in the evaluate module, to find the optimal hyper-parameters for the classifiers that we will use for creating the ensembles.

Even though LSVM is not part of the ensembles, its best hyperparameters have also been calculated.

tupini07 commented 5 years ago

To keep this simple we will only perform the grid search using one of the catalog/entities (specifically, discogs/musician).

The reason for this is that we wanted to choose one of the most difficult dataset to tune the classifiers. The actual most difficult ones are discogs and musicbrainz band, but these are not about persons (which is the main domain in which soweego operates), so the hardest dataset about people was chosen.

tupini07 commented 5 years ago

This task is to be just a preliminary search for the best hyperparameter, and is needed so that we can start working on the ensembles.

A more in depth investigation of the best hyperparameters will be done for each entitiy/catalog separately as part of #362

tupini07 commented 5 years ago

Grid search has finished for all classifiers used in ensembles and LSVM. The actual results can be seen here: grid_search_results_discogs_musician.zip

Models have been configured to use these 'best hyperparameters' by default