Closed andreshyer closed 4 years ago
I recall that dict
s can be exported. The issue with the param_grid is that the skopt
framework uses its own data types that are not Java compatible.
If dict object can be exported, would it be possible to force the param_grid to become a dict? Would that break other parts of the code?
I think it already is, more or less a dict. I think the issue is that the values inside the dict are strange types, like Integer()
ranges instead of int()
. For example:
bayes_grid = {
'kernel': Categorical(['rbf', 'poly', 'linear']),
'C': Real(10 ** -3, 10 ** 2, 'log-uniform'),
'gamma': Real(10 ** -3, 10 ** 0, 'log-uniform'),
'epsilon': Real(0.1, 0.6),
'degree': Integer(1, 5)
}
Yeah I noticed that the dict was not a normal dict. I added a little bit of code to try and debug
`for k, v in tqdm(d.items(), desc="Export to JSON", position=0):
if isinstance(v, pd.core.frame.DataFrame) or isinstance(v, pd.core.series.Series):
objs.append(k)
dfs.append(k)
getattr(self, k).to_json(path_or_buf=self.run_name + '_' + k + '.json')
if isinstance(v, dict):
try:
print(k, v)
with open(self.run_name + '_' + k + '.json', 'w') as f:
json.dumps(dict(v))
except:
print(f'FAIL {k} : {v}')
objs.append(k)
if not isinstance(v, (int, float, tuple, list, np.ndarray, bool, str, NoneType)):
objs.append(k)`
And the following output comes from this
param_grid {'n_estimators': Integer(low=100, high=2000), 'max_features': Categorical(categories=('auto', 'sqrt'), prior=None), 'max_depth': Integer(low=1, high=30), 'min_samples_split': Integer(low=2, high=30), 'min_samples_leaf': Integer(low=2, high=30), 'bootstrap': Categorical(categories=(True, False), prior=None)} FAIL param_grid : {'n_estimators': Integer(low=100, high=2000), 'max_features': Categorical(categories=('auto', 'sqrt'), prior=None), 'max_depth': Integer(low=1, high=30), 'min_samples_split': Integer(low=2, high=30), 'min_samples_leaf': Integer(low=2, high=30), 'bootstrap': Categorical(categories=(True, False), prior=None)} params {'bootstrap': True, 'max_depth': 30, 'max_features': 'auto', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100} predictions_stats {'r2_raw': array([0.8920112 , 0.89579143, 0.89603234, 0.89066064, 0.8926571 ]), 'r2_avg': 0.8934305428521252, 'r2_std': 0.002127357798190582, 'mse_raw': array([0.47072868, 0.45425046, 0.45320033, 0.47661584, 0.46791318]), 'mse_avg': 0.4645416995678035, 'mse_std': 0.009273261153887694, 'rmse_raw': array([0.6860967 , 0.67398105, 0.67320155, 0.6903737 , 0.6840418 ]), 'rmse_avg': 0.6815389605477208, 'rmse_std': 0.0068077032350010724, 'time_raw': array([2.16756701, 2.35749483, 2.13454795, 2.11755943, 2.14869523]), 'time_avg': 2.1851728916168214, 'time_std': 0.08771533743409417} FAIL predictions_stats : {'r2_raw': array([0.8920112 , 0.89579143, 0.89603234, 0.89066064, 0.8926571 ]), 'r2_avg': 0.8934305428521252, 'r2_std': 0.002127357798190582, 'mse_raw': array([0.47072868, 0.45425046, 0.45320033, 0.47661584, 0.46791318]), 'mse_avg': 0.4645416995678035, 'mse_std': 0.009273261153887694, 'rmse_raw': array([0.6860967 , 0.67398105, 0.67320155, 0.6903737 , 0.6840418 ]), 'rmse_avg': 0.6815389605477208, 'rmse_std': 0.0068077032350010724, 'time_raw': array([2.16756701, 2.35749483, 2.13454795, 2.11755943, 2.14869523]), 'time_avg': 2.1851728916168214, 'time_std': 0.08771533743409417}
It is failing on param_gird, params, and prediciton_stats, which all have a werid format in the dicts
I do have a question. I see you are using pickle objects as a checkpoint between calcuating features can hypertuning. Could we use pickle objects to store this data?
You are right, the other dicts are saving to json files just fine, aftering passing the dict() command. Strange why params_gird is mis-behaving. How it is being generated?
param_grid is generated in grid.py
. Manual entry.
Describe the bug In development branch, the param grid, as well as other dict-type ojects, can not be exported into a json object.
To Reproduce Run lines 173-185 in models.py, and follow code leading to storage.py
Proposed solution Export the dicts objects (param_grid, params, etc.) as there own files. Or just another format to export and save data. Perhaps pickle objects would be useful?