Closed tnigon closed 4 years ago
Tuning
inherits from FeaturesSelection
. The inherited design should allow for a seamless transition among various models - e.g., run Lasso first, then run PLS (resetting regressor
, regressor_params
, and param_grid
). df_tune_filter
will include the highest scoring tuning results for each number of features and each model (sorted by "regressor", then by "feat_n").from research_tools import feature_groups
from research_tools import Tuning
my_tune = Tuning(param_dict=feature_groups.param_dict_test)
my_tune.tune_regressor(print_out_tune=True)
must be set to create README file. Getting feature data... Performing feature selection... Executing hyperparameter tuning... Number of features: 1 Lasso: R2: 0.120 Number of features: 2 Lasso: R2: 0.785 Number of features: 3 Lasso: R2: 0.805 Number of features: 4 Lasso: R2: 0.805 Number of features: 5 Lasso: R2: 0.816 Number of features: 4 Lasso: R2: 0.810 Number of features: 3 Lasso: R2: 0.808
Add training functionality to the tuning class.
Because this class performs training, it was renamed to Training
. It's only function (as of now) is train()
, which first executes hyperparameter tuning and saves results to df_tune
, then trains the estimator and creates df_train
for each number of features.
Getting closer to closing this issue, and will do so when df_test_preds
is added. Then add graphing/plotting as a separate issue.
Seems to be working as intended, and have unit tests running to get full code coverage. There is not yet the ability to flag if a tuning and/or test has already been performed by this Training
instance, so there is a possibility of having [almost] duplicate rows in df_tune
and df_test
("uid" and index will be different).
Column data from df_pred
can be indexed according the the "uid" column in df_test
and df_test_filtered
.
Use:
from research_tools import feature_groups
from research_tools import Training
my_train = Training(param_dict=feature_groups.param_dict_test, print_out=False)
my_train.train()
must be set to create README file. Getting feature data... Performing feature selection... Executing hyperparameter tuning and estimator training...
Not sure if this is the best way to architect this, but I don't see any glaring reason that it wouldn't work well for our purposes.
Basically, the idea is to have a
feature_data
class instance for any data subset that we would like to train on. This could be satellite image data, drone imagery, weather data, management data, etc., or any combination thereof (see issue #9). The role of thefeature_data
instance is to provide functionality to access the desired data, join together into a cohesive dataframe, and create the X and y matrices that will be used bysklearn
.Next, we have to perform hyperparameter tuning for whichever model we decide to use. To do this, I propose we create a
tuning
class that inherits fromfeature_data
- basically, it needs all of the data we just accessed and organized usingfeature_data
to actually perform the tuning.To do
Tuning
.tuning
class on initialization (must inherit fromFeatureSelection
).Think about how tuning results should be stored, especially in regards to training multiple different models (Lasso, PLSR, random forest, etc.) Also think about how feature selection should be handled. Is this a "built-in" for
tuning
, or is it perhaps an additional class object (that probably also inherits fromfeature_data
?