RUCAIBox / RecBole-GNN

Efficient and extensible GNNs enhanced recommender library based on RecBole.
MIT License
167 stars 37 forks source link

执行超参数搜索时KeyError报错 #62

Open ithok opened 1 year ago

ithok commented 1 year ago

ERROR:hyperopt.fmin:job exception: 'model'

0%| | 0/12 [1:01:04<?, ?trial/s, best loss=?] Traceback (most recent call last): File "run_hyper.py", line 26, in main() File "run_hyper.py", line 18, in main hp.run() File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/recbole/trainer/hyper_tuning.py", line 411, in run fmin( File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 553, in fmin rval.exhaust() File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 356, in exhaust self.run(self.max_evals - n_done, block_until_done=self.asynchronous) File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 292, in run self.serial_evaluate() File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate result = self.domain.evaluate(spec, ctrl) File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/base.py", line 907, in evaluate rval = self.fn(pyll_rval) File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/recbole/trainer/hyper_tuning.py", line 349, in trial result_dict["model"], KeyError: 'model'

RT,报错信息如上,跑的模型是Hmlet,输入指令如下:

python run_hyper.py --model='HMLET' --dataset='ml-1m' --config_files='ml-1m.yaml' --params_file=Hmlet.hyper

hyp1231 commented 1 year ago

请问您的 recbole 版本是?我在 recbole 1.0.1 和 1.1.1 都进行了测试,似乎没有复现您说的问题。

ithok commented 1 year ago

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.4.0 pypi_0 pypi ca-certificates 2023.01.10 h06a4308_0
cachetools 5.3.0 pypi_0 pypi certifi 2022.12.7 py38h06a4308_0
charset-normalizer 2.1.1 pypi_0 pypi cloudpickle 2.2.1 pypi_0 pypi cmake 3.25.0 pypi_0 pypi colorama 0.4.4 pypi_0 pypi colorlog 4.7.2 pypi_0 pypi filelock 3.9.0 pypi_0 pypi future 0.18.3 pypi_0 pypi google-auth 2.16.2 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi grpcio 1.51.3 pypi_0 pypi hyperopt 0.2.5 pypi_0 pypi idna 3.4 pypi_0 pypi importlib-metadata 6.1.0 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi joblib 1.2.0 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
lit 15.0.7 pypi_0 pypi markdown 3.4.1 pypi_0 pypi markupsafe 2.1.2 pypi_0 pypi mpmath 1.2.1 pypi_0 pypi ncurses 6.4 h6a678d5_0
networkx 3.0 pypi_0 pypi numpy 1.23.5 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi openssl 1.1.1t h7f8727e_0
pandas 1.5.3 pypi_0 pypi pillow 9.3.0 pypi_0 pypi pip 23.0.1 py38h06a4308_0
plotly 5.13.1 pypi_0 pypi protobuf 4.22.1 pypi_0 pypi psutil 5.9.4 pypi_0 pypi py4j 0.10.9.7 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyg-lib 0.2.0+pt20cu117 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.8.16 h7a1cb2a_3
python-dateutil 2.8.2 pypi_0 pypi pytz 2022.7.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0
recbole 1.1.1 pypi_0 pypi requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-learn 1.2.2 pypi_0 pypi scipy 1.10.1 pypi_0 pypi setuptools 65.6.3 py38h06a4308_0
six 1.16.0 pypi_0 pypi sqlite 3.41.1 h5eee18b_0
sympy 1.11.1 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tenacity 8.2.2 pypi_0 pypi tensorboard 2.12.0 pypi_0 pypi tensorboard-data-server 0.7.0 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi thop 0.1.1-2209072238 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0
torch 2.0.0+cu117 pypi_0 pypi torch-cluster 1.6.1+pt20cu117 pypi_0 pypi torch-geometric 2.2.0 pypi_0 pypi torch-scatter 2.1.1+pt20cu117 pypi_0 pypi torch-sparse 0.6.17+pt20cu117 pypi_0 pypi torch-spline-conv 1.2.2+pt20cu117 pypi_0 pypi torchaudio 2.0.1+cu117 pypi_0 pypi torchvision 0.15.1+cu117 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi triton 2.0.0 pypi_0 pypi typing-extensions 4.4.0 pypi_0 pypi urllib3 1.26.13 pypi_0 pypi werkzeug 2.2.3 pypi_0 pypi wheel 0.38.4 py38h06a4308_0
xz 5.2.10 h5eee18b_1
zipp 3.15.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0

您好,是1.1.1版本的 condalist如上

ithok commented 1 year ago

看起来像是hyperopt的问题

ithok commented 1 year ago

您好,可以提供一下您运行的hyperopt的版本吗

hyp1231 commented 1 year ago

您好!目前初步判定是我们对 RecBole 1.1.1 版本适配产生的 bug。

bug 产生原因: RecBole 1.1.1 的某个 commit 里给超参调优的目标函数中加入了新的返回值

https://github.com/RUCAIBox/RecBole/commit/05a223e4de15ad9f722ed6a2cf9ecdf6dfc7fc16#diff-fe46181d8ec96ec5c6e9b1edd0dd0af2b8e4f8b4cab6a7e5a2090ef8a34f20eeL334-R347

但是 RecBole-GNN 的超参调优的目标函数并没有返回 'model' 这个 key

https://github.com/RUCAIBox/RecBole-GNN/blob/77d76b9e19436220b32fe61af95ffd95f17c1db7/recbole_gnn/quick_start.py#L82-L87

您如果着急的话可以先给 recbole_gnn/quick_start.py 的 82 行附近的返回值中加入一个新 key 值:

return {
  'model': config['model'],
  # ... ...
}

我们也将马上修复并进行测试,感谢找到这个 bug!!

ithok commented 1 year ago

好的,非常感谢!