dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Remove duplicate memory of data in _train() #66

Open xybramble opened 4 years ago

xybramble commented 4 years ago

I run the project and notice in _train() there is a memory usage in process same as memory usage of data. However, data in train_part() (also the return value in concat()) also takes up the memory usage. As the train_part() only uses the list_of_parts, if it is possible to delete the original data after returning the list_of_parts in _train()?

I tried client.cancel() and del but both failed.

TomAugspurger commented 4 years ago

Can you clarify a bit how / where / when you see the duplicate memory? You think that dask-xgboost is retaining a reference to some data so it isn't being garbage collected?