EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
551 stars 67 forks source link

Refactor code in preparation for voting learners #622

Closed desilinguist closed 4 years ago

desilinguist commented 4 years ago

As I was working on implementing a VotingLearner class (for #488), I realized that this implementation would end up sharing a lot of the code with methods for the Learner class. One way to handle this is to make VotingLearner inherit from Learner but that's not really right because, in fact, VotingLearner uses multiple underlying Learner instances. So, the solution I came up with is to refactor as much of this shared code into functions in learner.utils that can the be used by both Learner instances as well as VotingLearner instances (and, hopefully, StackingLearner instances too, when we get to those).

Specifically, this PR refactors:

In addition, This PR closes #621 and adds a test to make sure that the predictions being returned and written out via Learner.predict() match expectations. As part of this fix, the default value of the class_labels keyword argument for Learner.predict() is now set to True instead of False since it doesn't make sense to return class indices by default. 💥 This will be a an API-breaking change since to get class probabilities as outputs, class_labels will explicitly need to be set to False . 💥

Finally, this PR improves many docstrings, replaces single quotes with double quotes in some places, and replaces the old-style format strings with new-style ones in some of the code that was touched.

pep8speaks commented 4 years ago

Hello @desilinguist! Thanks for updating this PR.

Line 1802:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-07-01 16:43:16 UTC
desilinguist commented 4 years ago

Not sure why it doesn't show here but the Travis builds are actually complete.

desilinguist commented 4 years ago

Actually, this is not done - just found another simplification I can make. Please don't review it yet.

codecov[bot] commented 4 years ago

Codecov Report

Merging #622 into master will decrease coverage by 0.04%. The diff coverage is 93.98%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #622      +/-   ##
==========================================
- Coverage   95.15%   95.10%   -0.05%     
==========================================
  Files          26       27       +1     
  Lines        3031     3083      +52     
==========================================
+ Hits         2884     2932      +48     
- Misses        147      151       +4     
Impacted Files Coverage Δ
skll/learner/utils.py 94.94% <93.52%> (-0.94%) :arrow_down:
skll/learner/__init__.py 96.24% <94.73%> (+0.13%) :arrow_up:
skll/__init__.py 100.00% <100.00%> (ø)
skll/experiments/__init__.py 95.19% <100.00%> (ø)
skll/utils/commandline/generate_predictions.py 98.59% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7e413e8...e876bbb. Read the comment docs.

desilinguist commented 4 years ago

@bndgyawali @mulhod @aoifecahill this is now ready for review. The minor code coverage decrease is expected due to the refactoring.