NVIDIA-Merlin / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Apache License 2.0
950 stars 200 forks source link

[Question]Does HugeCtr support read data for trainning from Kafka ? #413

Closed sparkling9809 closed 1 year ago

sparkling9809 commented 1 year ago

I want to read data from kafka to implement realtim trainning. But the dataReader in Hugectr just supports file now. is there any way to support read data for trainning from Kafka? Thanks.

yingcanw commented 1 year ago

Thanks for your question. Currently, HugeCTR supports reading Parquet data, loading and saving models from/to remote file systems like HDFS, AWS S3, and GCS. And we only support Kafka in inference to support online update of incremental models to HPS. @jershi425 Please add your comments.

jershi425 commented 1 year ago

Yes as @yingcanw said, currently we don't support reading/streaming data from Kafka. Kafka is only for model updating purposes. And it is recommended to use our data reader to read parquet data for training due to its better performance and convenience.

sparkling9809 commented 1 year ago

OK, thanks !