desilinguist commented 4 years ago

This PR closes #606, closes #609, and closes #610.

Custom Metrics:

SKLL can now use arbitrary custom metric functions for both tuning and evaluation.
The details of how this works is described in detail in #606. The only tricky bit not described there is how the dynamic import works. Essentially, we take the Python file specified by the user and import it as a sub-module of skll.metrics and then tell SCORERS that the function that the name points to skll.metrics.<filename>.<function>.
Add a new file test_custom_metrics.py to test the custom metric functionality.
Add a new section to the documentation dedicated to this functionality called "Using custom metrics".

Add new metrics:

Added all variants of the jaccard_score metric from scikit-learn since those can be quite useful.
Added the non-binary variants of precision and recall to be consistent with f1_score and f0.5_score.
Added all new metrics to the documentation.

pep8speaks commented 4 years ago

Hello @desilinguist! Thanks for updating this PR.

In the file skll/experiments/__init__.py:

Line 183:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-05-28 20:38:15 UTC

codecov[bot] commented 4 years ago

Codecov Report

Merging #612 into master will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #612   +/-   ##
=======================================
  Coverage   95.18%   95.18%           
=======================================
  Files          26       26           
  Lines        3031     3031           
=======================================
  Hits         2885     2885           
  Misses        146      146

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 89e4fdc...89e4fdc. Read the comment docs.

desilinguist commented 4 years ago

Okay, I have modified the implementation such that custom metrics are now properly serialized when using gridmap. However, in order to support gridmap, we had to unfortunately lose the functionality wherein any invalid metrics in the configuration file could be identified at configuration parsing time. This check is now deferred until right before the job is submitted. However, this shouldn't be too bad. This deferral needed to happen because the registration of any potential custom metrics now happens inside _classify_featureset() and we cannot declare any metrics as invalid before that. This meant removing a couple of config parsing tests.

@mulhod can you please re-run your gridmap tests and any other tests you can think of. Thanks! @bndgyawali if you have time to review this too, it'd be great!

EducationalTestingService / skll

Add support for custom metrics #612

Comment last updated at 2020-05-28 20:38:15 UTC

Codecov Report