Open KWiecko opened 3 years ago
Would suggest looking at XGBoost's own Dask integration over using this library (if it is an option for you)
https://xgboost.readthedocs.io/en/latest/tutorials/dask.html
Unfortunately I am bound to xgboost==0.90. I tried to find the dask API buried inside xgboost==0.90 both in the docs (https://xgboost.readthedocs.io/en/release_0.90/index.html) and across the web and the only solution that I was able to find was dask-xgboost + xgboost.
What happened:
When
sample_weight
is specified and train() method input's npartitions is not equal to the number of workers the following evaluation:sample_weight = concat(sample_weight) if np.all(sample_weight) else None
fails :(
What you expected to happen: Train model without exceptions :)
Minimal Complete Verifiable Example:
The following code raises an exception:
"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
The bug is located in 102nd line of dask_xgboost/core.py file (https://github.com/dask/dask-xgboost/blob/master/dask_xgboost/core.py)
The following chunk proposes a change which should fix this
One notable thing is that
will evaluate to False due to 0 in Series. Is that expected (is a weight for a single observation allowed to be 0) ? If such value for an atomic weight is allowed this condition will make dask-xgboost skip proper weights containing 0s.
Both conditions will fail silently when one of atomic weights is a NaN. I suppose when sample_weight is not None but weight for one of the observations is NaN the code should raise an Exception, e.g.:
This was a tricky one to find.
Anything else we need to know?:
Environment: