coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Cannot run XGBoost examples #166

Closed phobson closed 2 years ago

phobson commented 2 years ago

via Zendesk https://coiled.zendesk.com/agent/tickets/85

A user writes:

I tried to run your examples for using xgboost on coiled cluster, but encountered several issues:

  1. when using coiled=0.0.54 as described https://cloud.coiled.io/examples/software, I cannot launch the cluster. Coiled does not accept my token.
  2. when using coiled=0.2.0 for the example named "Scaling XGBoost with Dask and Coiled" the cluster is successfully launched, but once at the training step, the system hangs: bst = dask_xgboost.train(client, params, X_train, y_train, num_boost_round=3) And after some time throws and error: IndexError: list index out of range
  3. when using coiled=0.2.0 for the example named "How To Run Financial Workloads on Terabytes of Data in Less Than an Hour": credit-risk, at the training step it throws an error: ModuleNotFoundError: No module named 'xgboost'
  4. Will I be able to increase the number of workers above current limit of 255 in the future?
  5. Will I be able to increase the number of cores above current limit of 2000 in the future?

I run the code from my Windows laptop via Jupyter notebook. See both notebooks attached. Archive.zip

@FabioRosado responded (paraphrasing): On (4) and (5):

if you use your own AWS or GCP credentials with coiled these limits will be much higher, but when using Coiled hosted we limit the number of resources that a user can use so one user can't impact the experience of other

On xgboost in general:

Looking at the xgboost, it seems that the reason why thinks were failing was because dask-xgboost is deprecated and we recommend using xgboost directly, you can find a tutorial on the xgboost site.

To resolve this, we need to revisit the some examples that use xgboost in coiled.

ntabris commented 2 years ago

If this user is on Coiled-hosted, can we talk to them and help them move to customer-hosted?

phobson commented 2 years ago

Yes -- I'll reach out, copying you

phobson commented 2 years ago

Got word back from the user that installing the coiled-runtime locally resolved everything.

Installing a coiled-runtime locally solved the issues with xgboost. Probably you should mention this in your example

So I suppose this was a version mismatch issue all along.

I presume that these are the examples in question: https://github.com/coiled/notebooks/tree/main/scaling-xgboost

I'll leave this open for now and I'll add a note about the coiled-runtime there.