[Question] Is there any way for hps to load an embedding table into multiple GPUs？

NVIDIA-Merlin / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

Apache License 2.0

905 stars 196 forks source link

[Question] Is there any way for hps to load an embedding table into multiple GPUs？ #407

Closed sparkling9809 closed 10 months ago

sparkling9809 commented 11 months ago

I found the lookup implementation for HPS just support to specify a GPU deviceid. So I confused is there any way for hps to load an embedding table into multiple GPUs？If there is , how can I lookup an embeding vector on multiple GPUs？

the HPS code location : https://github.com/NVIDIA-Merlin/HugeCTR/blob/91c5c9f16060ffd7ac99867e283f157e85e8a05d/HugeCTR/include/pybind/hps_wrapper.hpp#L41

yingcanw commented 11 months ago

Thanks for your question. HPS initializes the embedding cache on each device through the deployed_devices configured by the user (each device has an independent embedding cache and is not shared between devices). That is to say, the embedding vector in the embedding cache of each device is complete, so there is no need to query an embedding vector from multiple caches.

sparkling9809 commented 11 months ago

Ok, thanks for your reply. If the storage of GPU is not enough for hold the whole embedding table, I should set the gpu cache percentage under 100%。 Is it right?

yingcanw commented 11 months ago

Currently HPS supports three embedding cache types. If your embedding table can be loaded into the GPU, it is recommended to choose static to get the best performance. If you need to update the embedding cache dynamically online inference and the embedding table cannot be fully loaded into the GPU, you can choose dynamic. For details, please refer to the following link: HPS configuration book and HPS Arch.

sparkling9809 commented 11 months ago

Currently HPS supports three embedding cache types. If your embedding table can be loaded into the GPU, it is recommended to choose static to get the best performance. If you need to update the embedding cache dynamically online inference and the embedding table cannot be fully loaded into the GPU, you can choose dynamic. For details, please refer to the following link: HPS configuration book and HPS Arch.

OK， Thanks very much for your reply!