[QST] will Dask NVTabular still use the logic of NVTabular 0.1 ?

NVIDIA-Merlin / NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Apache License 2.0

1.05k stars 143 forks source link

[QST] will Dask NVTabular still use the logic of NVTabular 0.1 ? #350

Open PerkzZheng opened 4 years ago

PerkzZheng commented 4 years ago

What is your question? NVTabular 0.1 read big file as chunks in order not to exceed the memory size of single GPU.

Now, NVTabular 0.2 processes datasets based on the logic of Dask, so the chunk-logic in NVTabular is still working or it is completely overwritten by DASK partition-logic ?

And, I met exceeding maximum pool size error when reading large files based on NVTabular 0.2 (Dask).

ENV:

GPUs: Tesla V100, T4, V100s, P4
RMM Pooling size: 0.2*nvt.utils.device_mem_size(kind="total") < the smallest memory size of those 4 gpus.
Dataset size: ~ 20 GB (preprocessed, with many part-files, each ~ 123MB parquet)

rjzamora commented 4 years ago

Now, NVTabular 0.2 processes datasets based on the logic of Dask, so the chunk-logic in NVTabular is still working or it is completely overwritten by DASK partition-logic ?

NVTabular now uses Dask for out-of-core processing. The partitions are typically chosen to be ~1/8 the size of a single-GPUs total memory. We assume that all GPUs in your system are identical.

I'm a bit confused by your pool-size setting - Are you saying that you are using P100s and P4s at the same time?

PerkzZheng commented 4 years ago

@rjzamora Yes, I am using V100s and P4s at the same time for multi-GPUs. I may need to set parameters for each GPU separately.
Does client.run() support setting different rmm pool size for different GPUs?

rjzamora commented 4 years ago

Very interesting. I honestly don't know the answer to this.

I will ask some people with dask/distributed knowledge to see if the distributed scheduler will actually consider a heterogenous memory topology. If not, you may just have to choose settings as if all your GPUs were P4s. That is, you may need to explore settings like:

For your initial pool size: ~7_000_000_000
For your Dask cluster: device_memory_limit=6_000_000_000 (or maybe even 4-5GB to help with spilling)
For your Dataset: part_size="1GB"

PerkzZheng commented 4 years ago

Thanks. It helped a lot!