DataCanvasIO / HyperGBM

A full pipeline AutoML tool for tabular data
https://hypergbm.readthedocs.io/
Apache License 2.0
329 stars 46 forks source link

ValueError: Metadata mismatch found in `from_delayed`. #107

Open wangjianqiao111 opened 4 months ago

wangjianqiao111 commented 4 months ago

Please make sure that this is a bug.

System information

Describe the current behavior '''2024-05-29 14:55:30 [ERROR] Traceback (most recent call last): 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2024-05-29 14:55:30 [ERROR] return _run_code(code, main_globals, None, 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/runpy.py", line 86, in _run_code 2024-05-29 14:55:30 [ERROR] exec(code, run_globals) 2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 144, in 2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 132, in 2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 41, in get_args_func 2024-05-29 14:55:30 [ERROR] File "/opt/pylib/dc_runtime.zip/datacanvas/shell.py", line 73, in _execfile 2024-05-29 14:55:30 [ERROR] File "main.py", line 126, in 2024-05-29 14:55:30 [ERROR] step.fit(df_train=df_train, df_test=df_test) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/workdir/code_120b1eff-da34-4ae4-b9eb-752dd0f776ba/hypergbm_step.py", line 131, in fit 2024-05-29 14:55:30 [ERROR] experiment = make_experiment(log_level='INFO', verbose=1, use_cache=False, hypergbm_params_input) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypergbm/experiment.py", line 226, in make_experiment 2024-05-29 14:55:30 [ERROR] experiment = _make_experiment(hyper_model_cls, train_data, 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/experiment/_maker.py", line 378, in make_experiment 2024-05-29 14:55:30 [ERROR] id = hasher(dict(X_train=X_train, y_train=y_train, X_test=X_test, X_eval=X_eval, y_eval=y_eval, 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/data_hasher.py", line 20, in call 2024-05-29 14:55:30 [ERROR] for x in self._iter_data(data): 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 21, in _iter_data 2024-05-29 14:55:30 [ERROR] yield from super()._iter_data(data) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/data_hasher.py", line 58, in _iter_data 2024-05-29 14:55:30 [ERROR] yield from self._iter_data(v) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 15, in _iter_data 2024-05-29 14:55:30 [ERROR] yield from self._iter_dask_dataframe(data) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/hypernets/tabular/dask_ex/_data_hasher.py", line 30, in _iter_dask_dataframe 2024-05-29 14:55:30 [ERROR] meta={name: 'u8'}).compute() 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask_expr/_collection.py", line 476, in compute 2024-05-29 14:55:30 [ERROR] return DaskMethodsMixin.compute(out, kwargs) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/base.py", line 375, in compute 2024-05-29 14:55:30 [ERROR] (result,) = compute(self, traverse=False, kwargs) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/base.py", line 661, in compute 2024-05-29 14:55:30 [ERROR] results = schedule(dsk, keys, kwargs) 2024-05-29 14:55:30 [ERROR] File "/opt/aps/python/lib/python3.10/site-packages/dask/dataframe/utils.py", line 424, in check_meta 2024-05-29 14:55:30 [ERROR] raise ValueError( 2024-05-29 14:55:30 [ERROR] ValueError: Metadata mismatch found in from_delayed. 2024-05-29 14:55:30 [ERROR] Partition type: pandas.core.frame.DataFrame 2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+ 2024-05-29 14:55:30 [ERROR] | Column | Found | Expected | 2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+ 2024-05-29 14:55:30 [ERROR] | 'contact' | object | string | 2024-05-29 14:55:30 [ERROR] | 'default' | object | string | 2024-05-29 14:55:30 [ERROR] | 'education' | object | string | 2024-05-29 14:55:30 [ERROR] | 'housing' | object | string | 2024-05-29 14:55:30 [ERROR] | 'job' | object | string | 2024-05-29 14:55:30 [ERROR] | 'loan' | object | string | 2024-05-29 14:55:30 [ERROR] | 'marital' | object | string | 2024-05-29 14:55:30 [ERROR] | 'month' | object | string | 2024-05-29 14:55:30 [ERROR] | 'poutcome' | object | string | 2024-05-29 14:55:30 [ERROR] | 'y' | object | string | 2024-05-29 14:55:30 [ERROR] +-------------+--------+----------+ 2024-05-29 14:55:31 [ERROR] errorCode is 1'''

Describe the expected behavior

'''run successful''' Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Jupyter notebook.

Are you willing to submit PR?(Yes/No)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

wangjianqiao111 commented 4 months ago

运行HyperGBM-dask,二分类任务时报错如上

lixfz commented 4 months ago

you can downgrade pandas to ver 1.5.x and try again.