AIworx-Labs / chocolate

A fully decentralized hyperparameter optimization framework
http://chocolate.readthedocs.io
BSD 3-Clause "New" or "Revised" License
121 stars 41 forks source link

chocolate CMAES raise exception when use some space #29

Open yangzhg opened 5 years ago

yangzhg commented 5 years ago

my space is

space = {
    'Imputation@skipped': choice([True, False]),
    'Imputation@missing_values': choice(['NaN', 0]),
    'Imputation@strategy': choice(['mean', 'median', 'most_frequent']),
    'PCA@skipped': choice([True, False]),
    'PCA@whiten': choice([True, False]),
    'PCA@svd_solver': choice(['auto', 'full', 'arpack', 'randomized']),
    'JATA': {
        'RigdeC': {
            'RigdeC@alpha': uniform(low=0.0001, high=2),
            'RigdeC@fit_intercept': choice([True, False]),
            'RigdeC@normalize': choice([True, False]),
            'RigdeC@tol': log(low=-5, high=-1, base=10)
        },
        'XGBC': {
            'XGBC@min_child_weight': uniform(low=0, high=20),
            'XGBC@n_estimators': quantized_uniform(low=25, high=525, step=20),
            'XGBC@max_depth': quantized_uniform(low=1, high=20, step=1),
            'XGBC@subsample': uniform(low=0.7, high=1.0),
            'XGBC@learning_rate': uniform(low=0.001, high=1.0),
            'XGBC@colsample_bytree': uniform(low=0.1, high=1.0),
            'XGBC@colsample_bylevel': uniform(low=0.1, high=1.0),
            'XGBC@reg_alpha': log(low=-10, high=-1, base=10),
            'XGBC@reg_lambda': log(low=-10, high=-1, base=10),
            'XGBC@booster': {
                'gbtree': None,
                'XGBC@gblinear': {
                    'XGBC@updater': choice(['shotgun', 'coord_descent']),
                    'XGBC@feature_selector': choice(['cyclic', 'shuffle'])
                },
                'XGBC@dart': {
                    'XGBC@sample_type': choice(['uniform', 'weighted']),
                    'XGBC@normalize_type': choice(['tree', 'forest']),
                    'XGBC@rate_drop': uniform(low=0.0, high=1.0),
                    'XGBC@skip_drop': uniform(low=0.0, high=1.0)
                }
            }
        }
    }
}

and my test code is

conn = SQLiteConnection("sqlite:///my_db.db")
sampler = CMAES(conn, space, clear_db=True)
token, params = sampler.next()
print(params)

sometimes rase

Traceback (most recent call last):
  File "/Users/yang/workspace/baidu/bdg/jarvis-automl/automl/test.py", line 87, in <module>
    token, params = sampler.next()
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/base.py", line 159, in next
    return self._next()
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 86, in _next
    ancestors, ancestors_ids = self._load_ancestors(results)
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 194, in _load_ancestors
    candidate["step"] = numpy.array([c[str(k)] for k in self.space.names()])
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 194, in <listcomp>
    candidate["step"] = numpy.array([c[str(k)] for k in self.space.names()])
KeyError: 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@normalize_type'
yangzhg commented 5 years ago

this was because of string truncation, data of space.names()

['Imputation@missing_values', 'Imputation@skipped', 'Imputation@strategy', 'JATA__subspace', 'JATA_JATA_RigdeC_RigdeC@alpha', 'JATA_JATA_RigdeC_RigdeC@fit_intercept', 'JATA_JATA_RigdeC_RigdeC@normalize', 'JATA_JATA_RigdeC_RigdeC@tol', 'JATA_JATA_XGBC_XGBC@booster__subspace', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@normalize_type', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@rate_drop', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@sample_type', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@skip_drop', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@feature_selector', 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@updater', 'JATA_JATA_XGBC_XGBC@colsample_bylevel', 'JATA_JATA_XGBC_XGBC@colsample_bytree', 'JATA_JATA_XGBC_XGBC@learning_rate', 'JATA_JATA_XGBC_XGBC@max_depth', 'JATA_JATA_XGBC_XGBC@min_child_weight', 'JATA_JATA_XGBC_XGBC@n_estimators', 'JATA_JATA_XGBC_XGBC@reg_alpha', 'JATA_JATA_XGBC_XGBC@reg_lambda', 'JATA_JATA_XGBC_XGBC@subsample', 'PCA@skipped', 'PCA@svd_solver', 'PCA@whiten']

data of conn.all_complementary()

OrderedDict([('id', 1), ('Imputation@missing_values', 0.027475185556735227), ('Imputation@skipped', 0.7021638133954162), ('Imputation@strategy', 0.8260164998309139), ('JATA__subspace', 0.728034117228476), ('JATA_JATA_RigdeC_RigdeC@alpha', 0.8055218384975674), ('JATA_JATA_RigdeC_RigdeC@fit_intercept', 0.3529762705053573), ('JATA_JATA_RigdeC_RigdeC@normalize', 0.5846024876058484), ('JATA_JATA_RigdeC_RigdeC@tol', 0.7424829860073592), ('JATA_JATA_XGBC_XGBC@booster__subspace', 0.23363749041539883), ('**JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@normali**', 0.7762336196367197), ('**JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@rate_dr**', 0.7900677658761933), ('**JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@sample_**', 0.28040260890117763), ('**JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@skip_dr**', 0.820385547747369), ('**JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@fea**', 0.9146923201302927), ('JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@upd', 0.34535363134317765), ('JATA_JATA_XGBC_XGBC@colsample_bylevel', 0.6172600920902241), ('JATA_JATA_XGBC_XGBC@colsample_bytree', 0.6924336343637797), ('JATA_JATA_XGBC_XGBC@learning_rate', 0.8863954027476345), ('JATA_JATA_XGBC_XGBC@max_depth', 0.1037840280086878), ('JATA_JATA_XGBC_XGBC@min_child_weight', 0.513873297464781), ('JATA_JATA_XGBC_XGBC@n_estimators', 0.9900210962357014), ('JATA_JATA_XGBC_XGBC@reg_alpha', 0.4328215878612225), ('JATA_JATA_XGBC_XGBC@reg_lambda', 0.2583266669387462), ('JATA_JATA_XGBC_XGBC@subsample', 0.5177315714662735), ('PCA@skipped', 0.2512316825591715), ('PCA@svd_solver', 0.7546809426851572), ('PCA@whiten', 0.5356528245177863), ('_ancestor_id', -1), ('_chocolate_id', 0)])

you can see from ATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@updater to JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@gblinear_XGBC@upd The string was truncated strangely,but why

yangzhg commented 5 years ago

this maybe the max length limit of column name in sqlite

leconteur commented 5 years ago

It's possible. We used dataset to do the interface with SQLite, which does some strange stuff under the hood. I'm more and more considering changing this dependency but I'd rather not write a lot of sql code.

jlevy44 commented 5 years ago

33

jlevy44 commented 5 years ago

Any updates on this @leconteur @yangzhg

I'm also having this issue on conditional search spaces.

jlevy44 commented 5 years ago

https://www.sqlite.org/limits.html