NVIDIA-Merlin / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Apache License 2.0
905 stars 196 forks source link

Trouble installing hugectr_backend for Triton Server #431

Closed sezhiyanhari closed 7 months ago

sezhiyanhari commented 7 months ago

I'm trying to reproduce this inference example: https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb

I noticed that this example uses the hugectr backend instead of the hps backend. I'm trying to use the hugectr backend, but the installation keeps failing. Here are my installation steps:

  1. I follow the contribution guidelines here (https://nvidia-merlin.github.io/HugeCTR/main/hugectr_contributor_guide.html) and create a container using the Merlin CTR dockerfile
  2. I clone HugeCTR v4.3 because this is the latest version that fully supports the hugectr backend. However, there are always compilation failures, specifically related to librdkafka. Example failures include Since OpenSSL 3.0 [-Werror=deprecated-declarations] and {aka 'struct rd_kafkap_str_s[1]'} [-Werror=array-bounds]. I've tried various versions of librdkafka, but I always get these errors.

Is there something wrong about my installation procedure? Should I recreate the Docker container using Ubuntu 20.04 or 18.04 (as opposed to 22.04) since this was probably the Ubuntu version for HugeCTRv4.3?

I appreciate your help!

yingcanw commented 7 months ago

@sezhiyanhari Thanks for the feedback, the root cause of the examples not working is that the HugeCTR Triton backend has been deprecated from the V23.08(removed from the Triton server repo), and we only keep the pre-installation of HPS backend. So if you need to try more related examples about HPS, you can refer to HPS backend example or HPS python example.