AIworx-Labs / chocolate

A fully decentralized hyperparameter optimization framework
http://chocolate.readthedocs.io
BSD 3-Clause "New" or "Revised" License
121 stars 41 forks source link

KeyError: '_subspace' when using results_as_dataframe and ThompsonSampling+CMAES #40

Open williamjshipman opened 4 years ago

williamjshipman commented 4 years ago

Calling results_as_dataframe works fine if I use a MongoDBConnection and QuasiRandom sampler. However, changing the sampler to ThompsonSampling causes results_as_dataframe to throw an exception KeyError: '_subspace'. Here is some example code that demonstrates the problem. Uncommenting the line that usesThompsonSamplingand commenting out the line that usesQuasiRandom` results in the error.

from chocolate import Space, ThompsonSampling, CMAES, SQLiteConnection, QuasiRandom, log, quantized_uniform

s = Space([
    {
        "algo": "svm",
        "C": log(low=-3, high=5, base=10),
        "kernel": {
            "linear": None,
            "rbf": {
                "gamma": log(low=-2, high=3, base=10)
            }
        }
    },
    {
        "algo": "knn",
        "n_neighbors": quantized_uniform(low=1, high=20, step=1)
    }])

conn = SQLiteConnection(url="sqlite:///db.db")
sampler = QuasiRandom(conn, s)
# sampler = ThompsonSampling(CMAES, conn, s)
token, params = sampler.next()
print(f'Token: {token}')
print(f'Parameters: {params}')

results = conn.results_as_dataframe()
print(results)

The output, exception and stack trace when using ThompsonSampling are:

Token: {'_chocolate_id': 0, '_arm_id': 1}
Parameters: {'C': 80716.84865011052, 'gamma': 3.7193589528638826, 'kernel': 'rbf', 'algo': 'svm'}
Traceback (most recent call last):
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "g:\Research\WS\DeepRL\experimental\test_thompsonsampling_bug.py", line 26, in <module>
    results = conn.results_as_dataframe()
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in results_as_dataframe
    result = s([r[k] for k in s.names()])
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in <listcomp>
    result = s([r[k] for k in s.names()])
KeyError: '_subspace'

Looking at the database that is generated, I can see that the results table is lacking a _subspace column. Note that sampling new parameters works fine, as does storing losses, but I can't extract the results.

When everything works, I expect the output to look something like the following:

Token: {'_chocolate_id': 0}
Parameters: {'n_neighbors': 7, 'algo': 'knn'}
    n_neighbors algo
id
0             7  knn