dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Remove duplicate memory of data in _train() in dask_xgboost #67

Closed xybramble closed 4 years ago

xybramble commented 4 years ago

I run the project and notice in _train() there is a memory usage in process same as memory usage of data. However, data in train_part() (also the return value in concat()) also takes up the memory usage. As the train_part() only uses the list_of_parts, if it is possible to delete the original data after returning the list_of_parts in _train()?

I tried client.cancel() and del but both failed.

TomAugspurger commented 4 years ago

Duplicate of #66 I think.