Project dependencies may have API risk issues

Hi, In MLBox, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy==1.18.2
scipy==1.4.1
matplotlib==3.0.3
hyperopt==0.2.3
pandas==0.25.3
joblib==0.14.1
scikit-learn==0.22.1
tensorflow==2.0.0
lightgbm==2.3.1
tables==3.5.2
xlrd==1.2.0

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3. The version constraint of dependency joblib can be changed to ==0.7.0d. The version constraint of dependency joblib can be changed to >=0.3.6.dev,<=1.1.0. The version constraint of dependency scikit-learn can be changed to >=0.20rc1,<=0.20.4.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the matplotlib

matplotlib.use

The calling methods from the joblib

joblib.delayed
joblib.Parallel

The calling methods from the scikit-learn

sklearn.tree.DecisionTreeRegressor
sklearn.ensemble.RandomForestRegressor
sklearn.linear_model.LinearRegression
sklearn.linear_model.Ridge
sklearn.ensemble.ExtraTreesRegressor
sklearn.ensemble.AdaBoostClassifier
sklearn.preprocessing.LabelEncoder
joblib.delayed
sklearn.ensemble.RandomForestClassifier
sklearn.impute.SimpleImputer
sklearn.preprocessing.LabelEncoder.fit_transform
sklearn.tree.DecisionTreeClassifier
sklearn.ensemble.BaggingClassifier
sklearn.ensemble.AdaBoostRegressor
sklearn.linear_model.LogisticRegression
joblib.Parallel
sklearn.ensemble.ExtraTreesClassifier
sklearn.ensemble.BaggingRegressor
sklearn.linear_model.Lasso
sklearn.metrics.roc_auc_score
sklearn.metrics.make_scorer

The calling methods from the all methods

self.__Lnum.df.fillna
self.fit_transform
x.col.self.__Enc.col.df.apply.tolist
self.set_params
readme_file.read
col.df_train.apply
setattr
drift.DriftThreshold.get_support
estimator.fit
encoding.categorical_encoder.Categorical_encoder
pandas.datetime
i.col.x.get_embeddings.col.df.apply.tolist
clf.predict_proba
numpy.arange
print
y_train.drop.drop
est.get_params.items
tensorflow.keras.layers.Dense
mlbox.preprocessing.Reader.train_test_split
serie_to_df.hour.astype
self.transform
y.value_counts
warnings.warn
pandas.datetime.serie.pandas.DatetimeIndex.total_seconds
pandas.Series.describe
mlbox.optimisation.make_scorer
self.get_estimator
convert_list
sklearn.ensemble.RandomForestRegressor.fit
numpy.shape
self.__cv.split
classifier.Classifier
col.df.apply
self.clean
model.regression.feature_selector.Reg_feature_selector
self.__classifier.score
serie.pandas.DatetimeIndex.dayofweek.astype
tensorflow.keras.layers.concatenate
pandas.read_csv
pipe.append
self.__regressor.predict
len
self.fit
mlbox.preprocessing.Drift_thresholder
tensorflow.keras.models.Model.get_weights
self.__classifier.get_params.keys
hyperopt.hp.choice
self.__set_regressor
pp.set_params.predict
y_train.drop.apply
matplotlib.pyplot.savefig
pandas.read_json
model.get_estimator.get_params.items
selected_col.append
lightgbm.LGBMRegressor
tensorflow.keras.layers.Reshape
df_train.drop_duplicates.keys
path.split
drift_estimator.DriftEstimator
pandas.DataFrame.head
sorted.remove
col.df_train.dropna.unique
dropout1.Dropout
keepList.append
hyperopt.hp.uniform
y_train.pd.get_dummies.astype
pandas.Series.value_counts
time.time
sklearn.metrics.roc_auc_score
open.close
zip
d.copy
sklearn.linear_model.LinearRegression
sklearn.ensemble.ExtraTreesRegressor
pandas.DatetimeIndex
regressor.Regressor
pandas.Series.nunique
convert_float_and_dates.delayed
tuples.dict.items
col.df_train.dropna
pandas.concat.to_hdf
y.apply
str
ValueError
version_file.read
self.__K.values
serie_to_df.dayofweek.astype
self.__plot_feature_importances
space.keys
self.__classifier.get_params
df_train.drop_duplicates.drop_duplicates
numpy.exp
p.startswith
numpy.intersect1d
range
mock.Mock
numpy.random.seed
self.level_estimator.predict
self.__regress_params.items
params.keys
numpy.abs
sklearn.pipeline.Pipeline
serie_to_df.second.astype
self.__classif_params.items
df.value_counts
pandas.DataFrame
sklearn.model_selection.cross_val_score
serie_to_df.minute.astype
sklearn.pipeline.Pipeline.fit
self.level_estimator.predict_proba
self.__regressor.fit
reg.fit
serie.pandas.DatetimeIndex.minute.astype
filter
y_train.drop.value_counts
lightgbm.LGBMClassifier
self.__regressor.transform
mlbox.prediction.Predictor
self.get_params
tensorflow.keras.layers.Embedding
est.get_estimator.get_params
col.self.__K.Reshape
os.mkdir
drift.DriftThreshold.fit
sklearn.model_selection.StratifiedKFold.split
model.regression.regressor.Regressor.get_params
pickle.load
tensorflow.keras.layers.Dropout
numpy.int
sum
model.regression.stacking_regressor.StackingRegressor
reg.get_params
pp.set_params.set_params
numpy.sort
sklearn.model_selection.cross_val_predict
matplotlib.pyplot.yticks
serie.apply.tolist
pandas.concat.keys
self.__classifier.predict
fh.read.splitlines
params.items
reg.predict
matplotlib.pyplot.barh
params.update
est.feature_importances.values
pandas.DataFrame.idxmax
encoding.na_encoder.NA_encoder.get_params
list.x.type.serie.apply.sum
pandas.SparseDataFrame
model.classification.feature_selector.Clf_feature_selector
convert_list.delayed
self.__imp.transform
self.__set_classifier
self.__classifier.fit
sklearn.linear_model.Ridge
self.n_jobs.Parallel
open.write
operator.itemgetter
ds.drifts.items
dropList.append
numpy.sum
sorted
sklearn.ensemble.ExtraTreesClassifier
df_train.shape.df_train.isnull.sum.sort_values.max
model.get_params.items
serie.pandas.DatetimeIndex.second.astype
drift_estimator.DriftEstimator.score
self.get_estimator.estimator_weights_.sum
mlbox.prediction.Predictor.fit_predict
model.classification.stacking_classifier.StackingClassifier
self.__Lcat.df.fillna
col.df_train.nunique
df_train.sample
self.__regressor.score
model.regression.feature_selector.Reg_feature_selector.get_params
pandas.concat
pandas.concat.values
sklearn.metrics.SCORERS.keys
matplotlib.pyplot.show
sklearn.ensemble.BaggingClassifier
model.get_estimator.get_params
model.classification.classifier.Classifier
S.append
pp.set_params.fit
stck.STCK.get_params.copy
open
est.get_estimator.get_params.items
fh.read
tensorflow.keras.models.Model.compile
clf.fit
max
numpy.log
sklearn.ensemble.AdaBoostClassifier
sklearn.preprocessing.LabelEncoder
importance_bag.append
serie_to_df.month.astype
int
enumerate
self.__cross_val_predict_proba
get_embeddings
self.__imp.fit
df_train.shape.df_train.isnull.sum.sort_values
sync_fit
y_train.nunique.Dense
sklearn.linear_model.LogisticRegression
serie_to_df.day.astype
sklearn.linear_model.Lasso
min
set
df_test.sample
df_train.drop_duplicates.isnull
df_train.std
numpy.random.shuffle
hyperopt.fmin.items
tensorflow.keras.layers.Input
serie.pandas.DatetimeIndex.month.astype
pandas.get_dummies
pandas.to_datetime
pandas.Series
mlbox.optimisation.Optimiser.optimise
self.__classifier.predict_proba
sklearn.ensemble.RandomForestClassifier.fit
self.level_estimator.fit
sys.path.insert
self.__regressor.get_params
model.regression.regressor.Regressor.get_estimator
serie.pandas.DatetimeIndex.hour.astype
setuptools.setup
df_test.index.nunique
self.__Lcat.df_train.isnull
sklearn.impute.SimpleImputer
sklearn.preprocessing.LabelEncoder.fit_transform
pandas.DataFrame.to_csv
pandas.datetime.serie_to_df.total_seconds
embeddings.append
list
col.df_train.unique
self.__regressor.get_params.keys
stck.STCK.get_params
mlbox.optimisation.Optimiser
encoding.na_encoder.NA_encoder
pickle.dump
Mock
sklearn.ensemble.RandomForestRegressor
col.self.__K.col.self.__Enc.len.Embedding
self.get_params.keys
sklearn.ensemble.RandomForestClassifier
joblib.delayed
tensorflow.keras.models.Model.fit
df.drop
df_train.isnull.sum
numpy.zeros
self.__Lcat.df_train.isnull.sum
sys.modules.update
serie.apply.apply
copy.copy
mlbox.preprocessing.Drift_thresholder.fit_transform
df_train.drop_duplicates.to_hdf
joblib.Parallel
numpy.round
tensorflow.keras.models.Model
serie.pandas.DatetimeIndex.year.astype
col.self.__Enc.keys
sklearn.pipeline.Pipeline.transform
df_train.drop_duplicates.values
sklearn.model_selection.StratifiedKFold
df_train.index.nunique
sync_fit.delayed
sklearn.tree.DecisionTreeRegressor
pandas.read_hdf
os.path.dirname
self.clean.drop_duplicates
drift.DriftThreshold.drifts
df.name.pred.apply
pandas.Series.values
matplotlib.use
serie.pandas.DatetimeIndex.day.astype
pandas.read_excel
sklearn.tree.DecisionTreeClassifier
type
dict
drift_estimator.DriftEstimator.fit
numpy.std
numpy.mean
sklearn.ensemble.BaggingRegressor
clf.get_params
os.getcwd
estimator.predict_proba
serie_to_df.year.astype
self.level_estimator.get_params
model.get_params
matplotlib.pyplot.grid
callable
self.__save_feature_importances
min.Dense
mlbox.optimisation.Optimiser.evaluate
model.regression.regressor.Regressor.feature_importances
hyperopt.fmin
drift.DriftThreshold
sklearn.ensemble.AdaBoostRegressor
matplotlib.pyplot.text
y_train.nunique
y_train.index.nunique
matplotlib.pyplot.title
stck.STCK.get_params.copy.keys
pp.set_params.predict_proba
sklearn.metrics.make_scorer
matplotlib.pyplot.close
model.regression.regressor.Regressor
pickle.load.inverse_transform
dropout2.Dropout
drift.DriftThreshold.transform
target_name.df.isnull
var.df_train.nunique
self.__classifier.predict_log_proba
numpy.percentile
sklearn.model_selection.KFold
model.get_estimator
int.Dense
self.fit_transform.drop
matplotlib.pyplot.figure
df.apply
self.evaluate
is_null.df.drop
p.split
convert_float_and_dates
mlbox.preprocessing.Reader
col.df_train.mode
getattr
df_test.df_train.pd.concat.drop
inputs.append

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

AxeldeRomblay / MLBox

Project dependencies may have API risk issues #141