Open chrisbarber opened 4 years ago
ah... yeah cool good that they 'fully' follow sklearn api, although i was hoping for that :)
yeah actually after fit would be interesting.... i reckon we'll have to do some sort of special case for it as in GridCV
to get all the details of various models that were fit etc
don't know what's going on here.. tried this package on two OS's now, different versions of it, examples from the website and from the repo, different versions of swig, looking through bug reports... i am getting segfaults, runtime errors, scripts that just don't end after leaving them.. somebody has a docker image i guess but it's not official. what is the deal with this package? i mean.. i can dig in more but i'm just wondering how many people have a working set up of this and on what systems. i've tried on macos 10.15.4 and linux 5.6.0-1 (debian)
https://hub.docker.com/r/mfeurer/auto-sklearn/ should work
Okay, I fixed my issue.
Unfortunately auto-sklearn get_params
returns the exact same thing, before and after fit
and predict
, at least for this example (regression). So the to_mls
succeeds afterwards of course but I won't paste the output because it is identical to the above.
yea but the question is what's the reference for the actual trained models.... after fit
so like basically: https://github.com/automl/auto-sklearn/blob/master/autosklearn/estimators.py#L329 self._automl
will have the reference for the trained machines.... and those should be iterated over and get_params()
-ed and check the evaluation metrics result etc.
and that should be exported into the jsonld
any update on this?
sorry been a bit irregular w/ splitting time w/ another project. i'll catch up at some point
fyi i get:
>>> automl._automl[0].get_params()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/barberc/software/anaconda/envs/auto-sklearn/lib/python3.8/site-packages/sklearn/base.py", line 189, in get_params
for key in self._get_param_names():
File "/Users/barberc/software/anaconda/envs/auto-sklearn/lib/python3.8/site-packages/sklearn/base.py", line 164, in _get_param_names
raise RuntimeError("scikit-learn estimators should always "
RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class 'autosklearn.automl.AutoMLRegressor'> with constructor (self, *args, **kwargs) doesn't follow this convention.
furthermore ._automl
is a private attribute...
It seems like this functionality should live in auto-sklearn, and this weird pickiness of sklearn about subclass __init__
arguments should also be addressed there.
furthermore ._automl is a private attribute...
in python that's all just convention.... right? nothing there to enforce of it's 'private-ness', see https://docs.python.org/3.7/tutorial/classes.html#tut-private
and fyi:
automl.get_models_with_weights()[0][1].get_params()
and so forth and so on.... basically that contains how the pipeline looks like, each parts' parametrization is available as a value for config
key, see:
{'config': Configuration:
balancing:strategy, Value: 'none'
classifier:__choice__, Value: 'random_forest'
classifier:random_forest:bootstrap, Value: 'True'
classifier:random_forest:criterion, Value: 'gini'
classifier:random_forest:max_depth, Constant: 'None'
classifier:random_forest:max_features, Value: 0.48772464140872207
classifier:random_forest:max_leaf_nodes, Constant: 'None'
classifier:random_forest:min_impurity_decrease, Constant: 0.0
classifier:random_forest:min_samples_leaf, Value: 1
classifier:random_forest:min_samples_split, Value: 16
classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'no_encoding'
data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.010000000000000004
data_preprocessing:numerical_transformer:imputation:strategy, Value: 'most_frequent'
data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'normalize'
feature_preprocessor:__choice__, Value: 'polynomial'
feature_preprocessor:polynomial:degree, Value: 2
feature_preprocessor:polynomial:include_bias, Value: 'False'
feature_preprocessor:polynomial:interaction_only, Value: 'False',
'dataset_properties': {'task': 1,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'classification',
'signed': False},
'exclude': {},
'include': {},
'init_params': {'instance': '{"task_id": "breast_cancer"}'},
'random_state': <mtrand.RandomState at 0x7fea9bebc240>,
'steps': [('data_preprocessing',
DataPreprocessor(categorical_features=None, config=None,
dataset_properties=None, exclude=None,
force_sparse_output=None, include=None, init_params=None,
pipeline=None, random_state=None)),
('balancing', Balancing(random_state=None, strategy='none')),
('feature_preprocessor',
<autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7feaa097c198>),
['classifier',
<autosklearn.pipeline.components.classification.ClassifierChoice at 0x7feaa097cc88>]],
'data_preprocessing': DataPreprocessor(categorical_features=None, config=None,
dataset_properties=None, exclude=None,
force_sparse_output=None, include=None, init_params=None,
pipeline=None, random_state=None),
'balancing': Balancing(random_state=None, strategy='none'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7feaa097c198>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7feaa097cc88>,
'data_preprocessing__categorical_features': None,
'data_preprocessing__config': None,
'data_preprocessing__dataset_properties': None,
'data_preprocessing__exclude': None,
'data_preprocessing__force_sparse_output': None,
'data_preprocessing__include': None,
'data_preprocessing__init_params': None,
'data_preprocessing__pipeline': None,
'data_preprocessing__random_state': None,
'balancing__random_state': None,
'balancing__strategy': 'none'}
but of course you can extract the model's param directly as well:
automl.get_models_with_weights()[0][1].get_params()['classifier'].choice.get_params()
{'bootstrap': True,
'class_weight': None,
'criterion': 'gini',
'max_depth': None,
'max_features': 0.48772464140872207,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_samples_leaf': 1,
'min_samples_split': 16,
'min_weight_fraction_leaf': 0.0,
'n_jobs': 1,
'random_state': <mtrand.RandomState at 0x7feaa06c27e0>}
Here is some json. Basically the same as the generic sklearn one but with this additional snippet: https://github.com/ratschlab/mlschema-model-converters/blob/75dae1addc7f7d5d17d7d349f2645e737789a3f4/mlsconverters/autosklearn.py#L51-L54 And some handling for various objects that it has in the output of get_params.
{
"identifier": "2497ad25-83f6-410c-ad90-0f8b8d002f74",
"executes": {
"_id": "_:autosklearn.automl.AutoML",
"identifier": "041d0a07-9972-4daa-b2f3-b88d60684399",
"name": null,
"parameters": [
{
"_id": "_:@value",
"@type": "mls:HyperParameter"
}
],
"implements": {
"_id": "_:autosklearn.automl.AutoML",
"@type": "mls:Algorithm"
},
"version": null,
"@type": "mls:Implementation"
},
"input_values": [
{
"value": {
"type": "autosklearn.automl.AutoML",
"params": {
"backend": null,
"debug_mode": null,
"disable_evaluator_output": null,
"ensemble_memory_limit": null,
"ensemble_nbest": null,
"ensemble_size": null,
"exclude_estimators": null,
"exclude_preprocessors": null,
"get_smac_object_callback": null,
"include_estimators": null,
"include_preprocessors": null,
"initial_configurations_via_metalearning": null,
"keep_models": null,
"logging_config": null,
"max_models_on_disc": null,
"metadata_directory": null,
"ml_memory_limit": null,
"per_run_time_limit": null,
"precision": 32,
"resampling_strategy": null,
"resampling_strategy_arguments": null,
"seed": null,
"shared_mode": null,
"smac_scenario_args": null,
"time_left_for_this_task": null
}
},
"specified_by": {
"@id": "_:@value"
},
"@type": "mls:HyperParameterSetting"
}
],
"output_values": [
{
"_id": null,
"value": [
1.0,
{
"@value": {
"type": "autosklearn.pipeline.classification.SimpleClassificationPipeline",
"params": {
"config": {
"balancing:strategy": "none",
"classifier:__choice__": "random_forest",
"data_preprocessing:categorical_transformer:categorical_encoding:__choice__": "one_hot_encoding",
"data_preprocessing:categorical_transformer:category_coalescence:__choice__": "minority_coalescer",
"data_preprocessing:numerical_transformer:imputation:strategy": "mean",
"data_preprocessing:numerical_transformer:rescaling:__choice__": "standardize",
"feature_preprocessor:__choice__": "no_preprocessing",
"classifier:random_forest:bootstrap": "True",
"classifier:random_forest:criterion": "gini",
"classifier:random_forest:max_depth": "None",
"classifier:random_forest:max_features": 0.5,
"classifier:random_forest:max_leaf_nodes": "None",
"classifier:random_forest:min_impurity_decrease": 0.0,
"classifier:random_forest:min_samples_leaf": 1,
"classifier:random_forest:min_samples_split": 2,
"classifier:random_forest:min_weight_fraction_leaf": 0.0,
"data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction": 0.01
},
"dataset_properties": {
"task": 2,
"sparse": false,
"multilabel": false,
"multiclass": true,
"target_type": "classification",
"signed": false
},
"exclude": {},
"include": {},
"init_params": {
"instance": "{\"task_id\": \"e5941b9de02ebe2c0457a6ec6eb35c17\"}"
},
"random_state": [
"MT19937",
[1, 1812433254, 3713160357, 3109174145, 64984499, 3392658084, 446538473, 2629760756, 2453345558, 1394803949, 1021787430, 2063496713, 1304877364, 1713639158, 889001601, 1651239412, 1450863289, 745575081, 361057727, 2288771950, 1463387568, 2249488362, 26637982, 204036717, 1655702041, 1329048465, 2092351466, 1681619666, 3220660315, 1301783610, 626286181, 294669048, 3537128440, 3259518248, 2550101273, 1160881866, 308703547, 295714668, 35508674, 1599247281, 376272024, 3166459937, 1852735737, 3680868867, 612352556, 2760189833, 3816750341, 699140493, 1087846865, 394927937, 2063539671, 645417889, 2337669049, 3773167612, 678121169, 3006984620, 1163491294, 2559287860, 543155592, 3194181347, 2463543297, 3875146860, 475483913, 3707568076, 3881808875, 1264657097, 208126250, 1802809301, 367907560, 2433375693, 2851326449, 2380707878, 2911758972, 4243386879, 2229228726, 828161871, 2871116151, 990638198, 178193628, 1012573979, 1223581943, 3333023583, 1901888414, 3913876750, 3168662389, 656194888, 1553610174, 466840498, 686407570, 280737523, 2476489017, 1272981410, 3189431979, 3294710282, 1564477163, 4133221553, 823708826, 880616227, 1730254897, 335723347, 2123911971, 344194767, 119099153, 2915257116, 3339825470, 2524942970, 1191117250, 3403812186, 3988972937, 2575395295, 4072737183, 663832315, 808080503, 724042340, 2966189542, 2499643239, 3309205581, 1915303227, 72616536, 387525935, 2791701251, 2190905566, 3740328774, 831297460, 3750964864, 2190112044, 899144100, 2346558003, 3851695829, 2896963823, 1548614403, 3676707405, 2050891594, 4165893148, 1883017153, 2668787527, 50330561, 2063572142, 1853585557, 1716111087, 2937248370, 1650859709, 2682305722, 565243175, 3922227187, 3482032705, 2809081500, 2099376873, 230358556, 1065827745, 196966939, 3268845630, 3625508265, 1477799595, 4149453740, 2757835686, 3032697936, 2200108791, 3421680711, 4145382259, 3605253072, 1186485728, 3520482151, 3080733463, 3887314157, 4030447755, 1699987022, 1393253586, 1710066407, 710337383, 3754612557, 2741088369, 337455371, 1304761604, 3592681639, 3099385187, 4003676405, 317081535, 997754381, 480565460, 3806265432, 1068029852, 776179010, 470617537, 3653875421, 2273571919, 1055365147, 1317172834, 3414733003, 2835400613, 28845217, 631741764, 2334552212, 3565466095, 1225096926, 1277781438, 2416008223, 1268768054, 2750789241, 267768398, 2175383438, 268654341, 2550530755, 2971623408, 1666669894, 1934871760, 509782083, 2798468670, 2834016892, 2494149255, 1965005899, 2653045765, 2317194903, 1297426078, 916214929, 2967861004, 2236807006, 2476725285, 128488253, 4277714156, 3016192551, 1690883702, 1329810641, 593010415, 2341313579, 1754238478, 1242698701, 2152594527, 2103269013, 926178633, 647225267, 4243787142, 1489208161, 3188798921, 1327553793, 3644600811, 684513652, 2606555057, 2705329549, 2557469018, 1294205096, 70104222, 3020083528, 2015571237, 2768573480, 401698695, 2812362809, 328919870, 984940142, 1653817439, 471643152, 538942283, 2040555667, 1211982999, 1663497772, 2941793728, 3001026698, 313271977, 3644502703, 2423950047, 2629046069, 3450826936, 44600781, 2633869288, 4267014746, 4204914470, 1955987363, 2590608885, 2120168063, 1460034243, 258056600, 3693550087, 779446436, 902696389, 4228701387, 3165791227, 3478614865, 1500865135, 905884796, 3682046467, 2437847832, 2595888219, 4144484663, 1299603103, 648536946, 1762836247, 4265749196, 950840266, 2928992722, 2051369009, 2071186450, 1164619682, 210405235, 1296628868, 2425474719, 4083386904, 1978331343, 3190898799, 602128683, 2003319330, 1043377147, 756690484, 24776626, 1835824233, 1156421176, 2125448878, 1333136189, 607751135, 4255614767, 4238533009, 2583175632, 230472465, 3037259757, 1546348932, 2537279411, 110471952, 520621708, 63613561, 2843673595, 775036, 1899744556, 1168115970, 2685086321, 3410250658, 3151102153, 634647644, 3639125394, 3344624764, 1525171811, 1878800371, 3356530116, 3676542926, 602053165, 2686708238, 3703555082, 3754961372, 3970030923, 1749014201, 3391107050, 2478152000, 2121779806, 2636689360, 769835312, 4230539591, 1909812524, 417081626, 3096519324, 387659697, 3764499249, 3452925463, 3818277698, 3008920324, 15253694, 1479260759, 2421328720, 2220743357, 38831551, 1032912064, 3400956198, 2362808832, 3988706866, 1950464958, 3248573125, 1225815945, 1211036180, 346407094, 3867176764, 1257086026, 2725236231, 2843735658, 4147241082, 1729974832, 1256499145, 3765975901, 784776076, 4288277427, 3903532520, 3431522864, 2792589977, 2935989154, 3536596892, 3512984120, 605476293, 1774961976, 981422589, 822525778, 3343539932, 422954622, 1323482938, 2523465420, 2746609356, 1664448205, 272567300, 711582493, 3625722107, 3615865699, 950619756, 2864168489, 108006277, 3976313352, 680217319, 173747636, 291134870, 198587329, 595310009, 941470866, 2438488368, 1681923153, 1654783272, 3531789254, 4149541715, 2922706987, 684907209, 3116688362, 3288142886, 3953377592, 3332428007, 1400401813, 3745921798, 1701705628, 3744511893, 1838265811, 3314032512, 3894840150, 3810031409, 181324387, 983160249, 1444959400, 3836664153, 3032673327, 310789231, 3701565562, 1407580781, 2511575629, 3113822685, 1777261998, 2208898751, 106383174, 2961020500, 995776421, 3306087121, 2181030035, 2300064751, 1909543740, 4023156173, 1671619075, 2151956104, 237668401, 3204511253, 1303668692, 3868259787, 2737897899, 4091026033, 2877780671, 134376279, 398912026, 863520778, 3712468923, 3443213666, 2183809552, 2597379302, 349776833, 274697715, 4266593710, 4282186769, 3530757867, 520237914, 3369037397, 2285670338, 387086485, 618942879, 219892882, 2008897906, 2293749560, 2907436476, 3853296593, 327550390, 1558751403, 2125694704, 1822570484, 2409968265, 436622776, 2691124090, 1080819771, 2958107334, 2667158841, 2117901613, 440045635, 3861104471, 3574962701, 3210299248, 1368601573, 2434039520, 86704919, 3628108033, 1909858745, 227461000, 2530509465, 838433817, 730224848, 1060658180, 1318482825, 233266846, 2352800845, 2086493219, 3826355555, 3174377690, 1455208243, 1356597942, 663563056, 2501819374, 4213535259, 1585241464, 873997246, 2597898744, 427064229, 1587746589, 259660817, 1688808891, 4165834345, 1359025114, 2013923952, 2963511711, 2903220732, 356112706, 501549847, 1609412897, 1685128111, 2639303606, 700554261, 914150235, 2010650618, 2029243163, 3046509911, 715702687, 2206956754, 3045298216, 2922667179, 2497577415, 3001819604, 706666890, 2275923855, 3094184383, 2781697712, 3292952666, 4238614078, 278500659, 1440033346, 1552714131, 336554687, 2842580609, 2255044310, 2180071372, 99970159, 2078552309, 1172694639, 1359399314, 546452524, 349053834, 3072254369, 3043246719, 3314426498, 1594992663, 3582269665, 2114045278, 585873328, 840739494, 3475778485, 1506518790, 4008486652, 229989333, 3582278212, 363921215, 3592842520, 1833533669, 708173875, 564248927, 853943228, 2282731374, 2874158047, 3978663285, 2332696531, 1354524859, 58121641, 1445193461, 1936635021, 3374328198, 3465253060, 385589199, 1819596280, 912895627, 1877426726, 733280947, 2004202992, 3311780711, 3732053191, 309903272, 97290141, 2945419335, 3916477072, 1326195031, 3740938055, 3604745262, 3633308956, 3392929431, 1257547457, 251825182, 3318700085, 847033774, 137350663, 1716455973, 546850455, 4227574519, 3044214953, 2259874013, 2442748258, 2956971336, 2198772379, 1269686727, 2648116105, 1339159363, 1473334647, 2386671612, 2069268389],
624,
0,
0.0
],
"steps": [
[
"data_preprocessing",
{
"@value": {
"type": "autosklearn.pipeline.components.data_preprocessing.data_preprocessing.DataPreprocessor",
"params": {
"categorical_features": null,
"config": null,
"dataset_properties": null,
"exclude": null,
"force_sparse_output": null,
"include": null,
"init_params": null,
"pipeline": null,
"random_state": null
}
}
}
],
[
"balancing",
{
"@value": {
"type": "autosklearn.pipeline.components.data_preprocessing.balancing.balancing.Balancing",
"params": {
"random_state": null,
"strategy": "none"
}
}
}
],
[
"feature_preprocessor",
{
"densifier": "Densifier",
"extra_trees_preproc_for_classification": "ExtraTreesPreprocessorClassification",
"extra_trees_preproc_for_regression": "ExtraTreesPreprocessorRegression",
"fast_ica": "FastICA",
"feature_agglomeration": "FeatureAgglomeration",
"kernel_pca": "KernelPCA",
"kitchen_sinks": "RandomKitchenSinks",
"liblinear_svc_preprocessor": "LibLinear_Preprocessor",
"no_preprocessing": "NoPreprocessing",
"nystroem_sampler": "Nystroem",
"pca": "PCA",
"polynomial": "PolynomialFeatures",
"random_trees_embedding": "RandomTreesEmbedding",
"select_percentile_classification": "SelectPercentileClassification",
"select_percentile_regression": "SelectPercentileRegression",
"select_rates": "SelectRates",
"truncatedSVD": "TruncatedSVD"
}
],
[
"classifier",
{
"adaboost": "AdaboostClassifier",
"bernoulli_nb": "BernoulliNB",
"decision_tree": "DecisionTree",
"extra_trees": "ExtraTreesClassifier",
"gaussian_nb": "GaussianNB",
"gradient_boosting": "GradientBoostingClassifier",
"k_nearest_neighbors": "KNearestNeighborsClassifier",
"lda": "LDA",
"liblinear_svc": "LibLinear_SVC",
"libsvm_svc": "LibSVM_SVC",
"multinomial_nb": "MultinomialNB",
"passive_aggressive": "PassiveAggressive",
"qda": "QDA",
"random_forest": "RandomForest",
"sgd": "SGD"
}
]
],
"data_preprocessing": {
"@value": {
"type": "autosklearn.pipeline.components.data_preprocessing.data_preprocessing.DataPreprocessor",
"params": {
"categorical_features": null,
"config": null,
"dataset_properties": null,
"exclude": null,
"force_sparse_output": null,
"include": null,
"init_params": null,
"pipeline": null,
"random_state": null
}
}
},
"balancing": {
"@value": {
"type": "autosklearn.pipeline.components.data_preprocessing.balancing.balancing.Balancing",
"params": {
"random_state": null,
"strategy": "none"
}
}
},
"feature_preprocessor": {
"densifier": "Densifier",
"extra_trees_preproc_for_classification": "ExtraTreesPreprocessorClassification",
"extra_trees_preproc_for_regression": "ExtraTreesPreprocessorRegression",
"fast_ica": "FastICA",
"feature_agglomeration": "FeatureAgglomeration",
"kernel_pca": "KernelPCA",
"kitchen_sinks": "RandomKitchenSinks",
"liblinear_svc_preprocessor": "LibLinear_Preprocessor",
"no_preprocessing": "NoPreprocessing",
"nystroem_sampler": "Nystroem",
"pca": "PCA",
"polynomial": "PolynomialFeatures",
"random_trees_embedding": "RandomTreesEmbedding",
"select_percentile_classification": "SelectPercentileClassification",
"select_percentile_regression": "SelectPercentileRegression",
"select_rates": "SelectRates",
"truncatedSVD": "TruncatedSVD"
},
"classifier": {
"adaboost": "AdaboostClassifier",
"bernoulli_nb": "BernoulliNB",
"decision_tree": "DecisionTree",
"extra_trees": "ExtraTreesClassifier",
"gaussian_nb": "GaussianNB",
"gradient_boosting": "GradientBoostingClassifier",
"k_nearest_neighbors": "KNearestNeighborsClassifier",
"lda": "LDA",
"liblinear_svc": "LibLinear_SVC",
"libsvm_svc": "LibSVM_SVC",
"multinomial_nb": "MultinomialNB",
"passive_aggressive": "PassiveAggressive",
"qda": "QDA",
"random_forest": "RandomForest",
"sgd": "SGD"
},
"data_preprocessing__categorical_features": null,
"data_preprocessing__config": null,
"data_preprocessing__dataset_properties": null,
"data_preprocessing__exclude": null,
"data_preprocessing__force_sparse_output": null,
"data_preprocessing__include": null,
"data_preprocessing__init_params": null,
"data_preprocessing__pipeline": null,
"data_preprocessing__random_state": null,
"balancing__random_state": null,
"balancing__strategy": "none"
}
}
}
],
"specified_by": {
"@id": "_:automl"
},
"@type": "mls:ModelEvaluation"
}
],
"realizes": null,
"version": null,
"name": null,
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"dcterms": "http://purl.org/dc/terms/",
"executes": {
"@id": "mls:executes",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"dcterms": "http://purl.org/dc/terms/",
"name": "dcterms:title",
"parameters": {
"@id": "mls:hasHyperParameter",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"implements": {
"@id": "mls:implements",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"version": "dcterms:hasVersion"
}
},
"input_values": {
"@id": "mls:hasInput",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"specified_by": "mls:specifiedBy",
"value": "mls:hasValue"
}
},
"output_values": {
"@id": "mls:hasOutput",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"specified_by": "mls:specifiedBy",
"value": "mls:hasValue"
}
},
"realizes": {
"@id": "mls:implements",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"version": "dcterms:hasVersion",
"name": "dcterms:title"
},
"@type": "mls:Run"
}
And heres a diff since I didn't create a branch https://github.com/ratschlab/mlschema-model-converters/compare/ec4817c..master
yeah its a good start but these things need to be normalized to the schema, meaning that the output values has to be representing HyperParameterSettings if you know what i mean. coz there we have a full machine that has HyperParameter
and HyperParameterSettings
. i'll try to use the dumped json to show what i mean above
btw if you would do a PR then i could add there some more comments as well
yeah its a good start but these things need to be normalized to the schema, meaning that the output values has to be representing HyperParameterSettings if you know what i mean. coz there we have a full machine that has
HyperParameter
andHyperParameterSettings
. i'll try to use the dumped json to show what i mean above
I just randomly guessed using ModelEvaluation
. If it's as simple as switching that to HyperParameterSettings
like with the .input_values
I can do that; otherwise yeah I guess I will need some explanation
ok so let's take this part of the generated json:
"classifier:random_forest:bootstrap": "True",
"classifier:random_forest:criterion": "gini",
"classifier:random_forest:max_depth": "None",
"classifier:random_forest:max_features": 0.5,
"classifier:random_forest:max_leaf_nodes": "None",
"classifier:random_forest:min_impurity_decrease": 0.0,
"classifier:random_forest:min_samples_leaf": 1,
"classifier:random_forest:min_samples_split": 2,
"classifier:random_forest:min_weight_fraction_leaf": 0.0,
so this is basically the HyperParameterSetting of a sklearn RandomForest. if you run the converter on a simple sklearn RF you would get something like this:
{
"identifier": "a9156457-114e-4dea-9dfa-37f2b3a587df",
"executes": {
"_id": "_:sklearn.ensemble._forest.RandomForestClassifier",
"identifier": "aac39ab5-c124-4b84-bf85-d36c2d925c56",
"name": null,
"parameters": [{
"_id": "_:bootstrap",
"@type": "mls:HyperParameter"
}, {
"_id": "_:ccp_alpha",
"@type": "mls:HyperParameter"
}, {
"_id": "_:class_weight",
"@type": "mls:HyperParameter"
}, {
"_id": "_:criterion",
"@type": "mls:HyperParameter"
}, {
"_id": "_:max_depth",
"@type": "mls:HyperParameter"
}, {
"_id": "_:max_features",
"@type": "mls:HyperParameter"
}, {
"_id": "_:max_leaf_nodes",
"@type": "mls:HyperParameter"
}, {
"_id": "_:max_samples",
"@type": "mls:HyperParameter"
}, {
"_id": "_:min_impurity_decrease",
"@type": "mls:HyperParameter"
}, {
"_id": "_:min_impurity_split",
"@type": "mls:HyperParameter"
}, {
"_id": "_:min_samples_leaf",
"@type": "mls:HyperParameter"
}, {
"_id": "_:min_samples_split",
"@type": "mls:HyperParameter"
}, {
"_id": "_:min_weight_fraction_leaf",
"@type": "mls:HyperParameter"
}, {
"_id": "_:n_estimators",
"@type": "mls:HyperParameter"
}, {
"_id": "_:n_jobs",
"@type": "mls:HyperParameter"
}, {
"_id": "_:oob_score",
"@type": "mls:HyperParameter"
}, {
"_id": "_:random_state",
"@type": "mls:HyperParameter"
}, {
"_id": "_:verbose",
"@type": "mls:HyperParameter"
}, {
"_id": "_:warm_start",
"@type": "mls:HyperParameter"
}],
"implements": {
"_id": "_:sklearn.ensemble._forest.RandomForestClassifier",
"@type": "mls:Algorithm"
},
"version": null,
"@type": "mls:Implementation"
},
"input_values": [{
"value": {
"@type": "xsd:boolean",
"@value": true
},
"specified_by": {
"@id": "_:bootstrap"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:float",
"@value": 0.0
},
"specified_by": {
"@id": "_:ccp_alpha"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:class_weight"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:string",
"@value": "entropy"
},
"specified_by": {
"@id": "_:criterion"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:max_depth"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:string",
"@value": "auto"
},
"specified_by": {
"@id": "_:max_features"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:max_leaf_nodes"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:max_samples"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:float",
"@value": 0.0
},
"specified_by": {
"@id": "_:min_impurity_decrease"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:min_impurity_split"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:int",
"@value": 1
},
"specified_by": {
"@id": "_:min_samples_leaf"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:int",
"@value": 2
},
"specified_by": {
"@id": "_:min_samples_split"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:float",
"@value": 0.0
},
"specified_by": {
"@id": "_:min_weight_fraction_leaf"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:int",
"@value": 1
},
"specified_by": {
"@id": "_:n_estimators"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:n_jobs"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:boolean",
"@value": false
},
"specified_by": {
"@id": "_:oob_score"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:anyURI",
"@value": null
},
"specified_by": {
"@id": "_:random_state"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:int",
"@value": 0
},
"specified_by": {
"@id": "_:verbose"
},
"@type": "mls:HyperParameterSetting"
}, {
"value": {
"@type": "xsd:boolean",
"@value": false
},
"specified_by": {
"@id": "_:warm_start"
},
"@type": "mls:HyperParameterSetting"
}],
"output_values": [{
"_id": "_:accuracy_score1892606500",
"value": {
"@type": "xsd:double",
"@value": 0.864406779661017
},
"specified_by": {
"_id": "http://www.w3.org/ns/mls#accuracy",
"@type": "mls:EvaluationMeasure"
},
"@type": "mls:ModelEvaluation"
}],
"realizes": {
"_id": "_:sklearn.ensemble._forest.RandomForestClassifier",
"@type": "mls:Algorithm"
},
"version": null,
"name": null,
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"dcterms": "http://purl.org/dc/terms/",
"executes": {
"@id": "mls:executes",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"dcterms": "http://purl.org/dc/terms/",
"name": "dcterms:title",
"parameters": {
"@id": "mls:hasHyperParameter",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"implements": {
"@id": "mls:implements",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"version": "dcterms:hasVersion"
}
},
"input_values": {
"@id": "mls:hasInput",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"specified_by": "mls:specifiedBy",
"value": "mls:hasValue"
}
},
"output_values": {
"@id": "mls:hasOutput",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"specified_by": "mls:specifiedBy",
"value": "mls:hasValue"
}
},
"realizes": {
"@id": "mls:implements",
"@context": {
"mls": "http://www.w3.org/ns/mls#",
"@version": 1.1,
"_id": "@id"
}
},
"version": "dcterms:hasVersion",
"name": "dcterms:title"
},
"@type": "mls:Run"
}
so the idea is that the first one i've quoted should be formulated something like above namely have an Implementation
and that has it's HyperParamaters
which will have their HyperParameterSettings
...
and similarly to all the other sklearn components in the pipeline
what's the status of mlschema? is it possible to programmatically validate against it yet?
afaik there's currently no json schema defined over it, nor xmlschema.
@vigsterkr can you tell me if you like this json. this does two things:
considers everything that responds to get_params
as an mls Run
and all the params as HyperParameterSettings
. if things are not Run
's then i guess i need to know what they are.
takes that Configuration
from autosklearn and instantiates a dummy sklearn model according to it, so it can be converted to the corresponding mls. right now this is hacked-in (hard coded for random forest for this example); want to confirm before generalizing
if it's too hard to confirm the json i can clean up what i have and check in so that it can be reviewed conceptually but (1) and (2) above basically explain what i did, and to me it is more efficient to hack certian bits until i know what i am actually trying to produce
@chrisbarber i'll check into it asap in the meanwhile i'll just put together a json schema as that should be fairly easy to do and then that could be used to validate outputs in tests as well
This could help clean up what I hacked together for (2) above, but not sure yet https://github.com/automl/auto-sklearn/issues/886#issuecomment-653423398
took this example from autosklearn and passed it to
to_mls
and it produces somethingso i guess they support the get_params convention.
this is before calling
fit
on the model which segfaults on my mac for this case