desilinguist commented 4 years ago

This PR closes #600.

The various SKLL modules learner.py, experiments.py, config.py etc. were getting really long as single files and unwieldy when it came to adding new development. This PR tries to address this issue by converting these modules into sub-packages instead with specific functions moved to different modules under these sub-package.

Here's what the SKLL code tree looks like today:

.
├── __init__.py
├── config.py
├── data
│   ├── __init__.py
│   ├── dict_vectorizer.py
│   ├── featureset.py
│   ├── readers.py
│   ├── writers.py
├── experiments.py
├── learner.py
├── logutils.py
├── metrics.py
├── utilities
│   ├── __init__.py
│   ├── compute_eval_from_predictions.py
│   ├── filter_features.py
│   ├── generate_predictions.py
│   ├── join_features.py
│   ├── plot_learning_curves.py
│   ├── print_model_weights.py
│   ├── run_experiment.py
│   ├── skll_convert.py
│   └── summarize_results.py
└── version.py

13 directories, 84 files

With this PR, it will look like:

.
├── __init__.py
├── config
│   ├── __init__.py
│   └── utils.py
├── data
│   ├── __init__.py
│   ├── dict_vectorizer.py
│   ├── featureset.py
│   ├── readers.py
│   ├── writers.py
├── experiments
│   ├── __init__.py
│   ├── input.py
│   ├── output.py
│   └── utils.py
├── learner
│   ├── __init__.py
│   └── utils.py
├── metrics.py
├── utils
│   ├── __init__.py
│   ├── commandline
│   │   ├── __init__.py
│   │   ├── compute_eval_from_predictions.py
│   │   ├── filter_features.py
│   │   ├── generate_predictions.py
│   │   ├── join_features.py
│   │   ├── plot_learning_curves.py
│   │   ├── print_model_weights.py
│   │   ├── run_experiment.py
│   │   ├── skll_convert.py
│   │   └── summarize_results.py
│   ├── constants.py
│   └── logging.py
└── version.py

12 directories, 91 files

In addition, some functions that were previously indicated to be private (with a leading underscore) are now public since they can actually be quite useful as part of the API. Some examples include experiments.input.load_featureset() and experiments.output.generate_learning_curve_plots().

Specifically, this PR:

Reorganizes and refactors SKLL module code into sub-packages.
Sets up __init__.py for these sub-packages such that most of the idiomatic SKLL imports work just like they do now.
Removes unnecessary imports from the top-level skll namespace. Right now only the following are importable from skll - FeatureSet, Learner, and run_configuration.
Moves logutils.py to utils/logging.py
Consolidates all the constants from across the various modules into a single place: utils/constants.py.
Moves all the command line utilities under utils/commandline/ and adjusts setup.py accordingly.
Converts the private _import_custom_learner() function to a public function called load_custom_learner() which does not modify globals() directly but just returns the loaded learner to the caller which then modifies globals() on its end.
Renames test_logutils.py and test_utilities.py to test_logging_utils.py and test_commandline_utils.py.
Updates API documentation to remove things that are not top-level importable, to include the new sub-packages and constants, and to tweak the names of sections.

pep8speaks commented 4 years ago

Hello @desilinguist! Thanks for updating this PR.

In the file setup.py:

Line 41:101: E501 line too long (113 > 100 characters)

In the file skll/config/__init__.py:

Line 769:101: E501 line too long (106 > 100 characters)

In the file tests/test_classification.py:

Line 1115:101: E501 line too long (114 > 100 characters) Line 1117:101: E501 line too long (114 > 100 characters) Line 1119:101: E501 line too long (117 > 100 characters) Line 1165:101: E501 line too long (114 > 100 characters) Line 1167:101: E501 line too long (114 > 100 characters) Line 1169:101: E501 line too long (117 > 100 characters)

Comment last updated at 2020-04-11 14:00:59 UTC

codecov[bot] commented 4 years ago

Codecov Report

Merging #601 into master will increase coverage by 0.10%. The diff coverage is 96.87%.

@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   95.06%   95.16%   +0.10%     
==========================================
  Files          20       26       +6     
  Lines        2977     3021      +44     
==========================================
+ Hits         2830     2875      +45     
+ Misses        147      146       -1

Impacted Files	Coverage Δ
skll/metrics.py	`96.87% <ø> (-0.27%)`	:arrow_down:
...utils/commandline/compute_eval_from_predictions.py	`97.18% <ø> (ø)`
skll/utils/commandline/filter_features.py	`98.41% <ø> (ø)`
skll/utils/commandline/generate_predictions.py	`98.59% <ø> (ø)`
skll/utils/commandline/join_features.py	`98.14% <ø> (ø)`
skll/utils/commandline/print_model_weights.py	`94.91% <ø> (ø)`
skll/utils/commandline/run_experiment.py	`96.77% <ø> (ø)`
skll/experiments/utils.py	`93.51% <93.51%> (ø)`
skll/config/utils.py	`96.00% <96.00%> (ø)`
skll/learner/utils.py	`96.31% <96.31%> (ø)`
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1d64e25...996d44f. Read the comment docs.

EducationalTestingService / skll

Convert modules into sub-packages #601

Comment last updated at 2020-04-11 14:00:59 UTC

Codecov Report