desilinguist commented 4 years ago

As I was working on implementing a VotingLearner class (for #488), I realized that this implementation would end up sharing a lot of the code with methods for the Learner class. One way to handle this is to make VotingLearner inherit from Learner but that's not really right because, in fact, VotingLearner uses multiple underlying Learner instances. So, the solution I came up with is to refactor as much of this shared code into functions in learner.utils that can the be used by both Learner instances as well as VotingLearner instances (and, hopefully, StackingLearner instances too, when we get to those).

Specifically, this PR refactors:

code that computes evaluation metrics from Learner.evaluate() into learner.utils.compute_evaluation_metrics().
code that writes predictions to files from Learner.predict() into learner.utils.write_predictions().
code that merges any unseen test set labels with training labels from Learner.evaluate() into learner.utils.add_unseen_labels().
code that computes the number of folds based on the number of examples for each label from Learner._compute_num_folds_from_example_counts() into learner.utils.compute_num_folds_from_example_counts().
code that computes various types of predictions (raw, class labels, class indices, probabilities) from Learner.predict() into learner.utils.get_predictions().
code that sets up the cross-validation fold iterator from Learner.cross_validate() into learner.utils.setup_cv_fold_iterator().
code that sets up the cross-validation split iterator from Learner.learning_curve() into learner.utils.setup_cv_split_iterator().

In addition, This PR closes #621 and adds a test to make sure that the predictions being returned and written out via Learner.predict() match expectations. As part of this fix, the default value of the class_labels keyword argument for Learner.predict() is now set to True instead of False since it doesn't make sense to return class indices by default. 💥 This will be a an API-breaking change since to get class probabilities as outputs, class_labels will explicitly need to be set to False . 💥

Finally, this PR improves many docstrings, replaces single quotes with double quotes in some places, and replaces the old-style format strings with new-style ones in some of the code that was touched.

pep8speaks commented 4 years ago

Hello @desilinguist! Thanks for updating this PR.

In the file tests/test_classification.py:

Line 1802:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-07-01 16:43:16 UTC

desilinguist commented 4 years ago

Not sure why it doesn't show here but the Travis builds are actually complete.

desilinguist commented 4 years ago

Actually, this is not done - just found another simplification I can make. Please don't review it yet.

codecov[bot] commented 4 years ago

Codecov Report

Merging #622 into master will decrease coverage by 0.04%. The diff coverage is 93.98%.

@@            Coverage Diff             @@
##           master     #622      +/-   ##
==========================================
- Coverage   95.15%   95.10%   -0.05%     
==========================================
  Files          26       27       +1     
  Lines        3031     3083      +52     
==========================================
+ Hits         2884     2932      +48     
- Misses        147      151       +4

Impacted Files	Coverage Δ
skll/learner/utils.py	`94.94% <93.52%> (-0.94%)`	:arrow_down:
skll/learner/__init__.py	`96.24% <94.73%> (+0.13%)`	:arrow_up:
skll/__init__.py	`100.00% <100.00%> (ø)`
skll/experiments/__init__.py	`95.19% <100.00%> (ø)`
skll/utils/commandline/generate_predictions.py	`98.59% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7e413e8...e876bbb. Read the comment docs.

desilinguist commented 4 years ago

@bndgyawali @mulhod @aoifecahill this is now ready for review. The minor code coverage decrease is expected due to the refactoring.

EducationalTestingService / skll

Refactor code in preparation for voting learners #622

Comment last updated at 2020-07-01 16:43:16 UTC

Codecov Report