dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Getting ValueError when fitting model #64

Open datainvestor opened 4 years ago

datainvestor commented 4 years ago

I'm getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-23f3b9353ae8> in <module>
      3 
      4 est=XGBClassifier()
----> 5 est.fit(X_train, y_train)

~/anaconda3/lib/python3.7/site-packages/dask_xgboost/core.py in fit(self, X, y, classes, eval_set, sample_weight_eval_set, eval_metric, early_stopping_rounds)
    515             missing=self.missing,
    516             n_jobs=self.n_jobs,
--> 517             early_stopping_rounds=early_stopping_rounds,
    518         )
    519 

~/anaconda3/lib/python3.7/site-packages/dask_xgboost/core.py in train(client, params, data, labels, dmatrix_kwargs, **kwargs)
    240     """
    241     return client.sync(
--> 242         _train, client, params, data, labels, dmatrix_kwargs, **kwargs
    243     )
    244 

~/anaconda3/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    754         else:
    755             return sync(
--> 756                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    757             )
    758 

~/anaconda3/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    331     if error[0]:
    332         typ, exc, tb = error[0]
--> 333         raise exc.with_traceback(tb)
    334     else:
    335         return result[0]

~/anaconda3/lib/python3.7/site-packages/distributed/utils.py in f()
    315             if callback_timeout is not None:
    316                 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 317             result[0] = yield future
    318         except Exception as exc:
    319             error[0] = sys.exc_info()

~/anaconda3/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/anaconda3/lib/python3.7/site-packages/tornado/gen.py in run(self)
    746                             exc_info = None
    747                     else:
--> 748                         yielded = self.gen.send(value)
    749 
    750                 except (StopIteration, Return) as e:

~/anaconda3/lib/python3.7/site-packages/dask_xgboost/core.py in _train(client, params, data, labels, dmatrix_kwargs, **kwargs)
    182 
    183     # Start the XGBoost tracker on the Dask scheduler
--> 184     host, port = parse_host_port(client.scheduler.address)
    185     env = yield client._run_on_scheduler(
    186         start_tracker, host.strip("/:"), len(worker_map)

~/anaconda3/lib/python3.7/site-packages/dask_xgboost/core.py in parse_host_port(address)
     29     if "://" in address:
     30         address = address.rsplit("://", 1)[1]
---> 31     host, port = address.split(":")
     32     port = int(port)
     33     return host, port

ValueError: not enough values to unpack (expected 2, got 1)

While trying to fit the model:

import pandas as pd

from dask.distributed import Client, progress
#from sklearn.ensemble import RandomForestClassifier
#from sklearn.model_selection import train_test_split
import joblib
from dask import dataframe as ddf
import numpy as np
from dask_ml.model_selection import train_test_split
from dask_ml.xgboost import XGBClassifier, train, predict

client=Client(processes=False,threads_per_worker=8,n_workers=1, memory_limit="16GB")

#split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(df["Store Area"], df[Y_column])

est=XGBClassifier()
est.fit(X_train, y_train)

Any idea why this might be happening?

TomAugspurger commented 4 years ago

Looks similar to https://github.com/dask/dask-xgboost/issues/30? I don't believe that was ever resolved.

FYI, I don't think dask-xgboost will be helpful on a single machine (which you're using with processes=False) but I may be wrong.

datainvestor commented 4 years ago

This looks similar I think. So you say it is impossible to use Dask with XGBoost now?

TomAugspurger commented 4 years ago

No. I think you don't want to be using dask-xgboost with Client(processes=False).

On Fri, Dec 6, 2019 at 3:33 PM datainvestor notifications@github.com wrote:

This looks similar I think. So you say it is impossible to use Dask with XGBoost now?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/64?email_source=notifications&email_token=AAKAOIXMSJIPBTCIBN6L3FDQXLAL7A5CNFSM4JW3QX2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGFM55I#issuecomment-562745077, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIQHC7SNL3TO6GIRFILQXLAL7ANCNFSM4JW3QX2A .