issues
search
NVIDIA-Merlin
/
HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Apache License 2.0
937
stars
200
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Question] Confused about the additional element of the output of InteractionLayer
#408
heroes999
closed
1 year ago
6
[Question] Is there any way for hps to load an embedding table into multiple GPUs?
#407
sparkling9809
closed
1 year ago
4
[Question]link for day_1.gz is invalid
#406
zmxdream
closed
1 year ago
0
Update session_inference_test.cpp
#405
lxh
closed
11 months ago
0
[Question] Multi-node training encounters Runtime error: unhandled system error ncclGroupEnd()
#404
heroes999
closed
1 year ago
9
[Question] position bias
#403
skunkwerk
closed
1 year ago
2
[Question] A question towards HugeCTR::concurrent_unordered_map::get_insert
#402
heroes999
closed
1 year ago
6
[BUG]dlrm script has quite a lot compatibility issues.
#401
zpcalan
closed
1 year ago
2
[Question] Are Sharp and IB a must have for multi-node traning?
#400
heroes999
closed
1 year ago
3
[Question]Can I build and use gpu_cache independently?
#399
RobertLou
closed
1 year ago
1
[Question]Any randomness in data reader? Any randomness in Model.fit?
#398
heroes999
closed
1 year ago
3
[Question]How to process criteo day0(50GB)'s dataset to run ETC?
#397
zpcalan
closed
11 months ago
11
Can't install sparse_operation_kit
#396
yourtj
closed
1 year ago
2
[Question]Can't use ETC to train multiple datasets.
#395
zpcalan
closed
1 year ago
3
[Question]loss_test not stable, sometimes some cases will fail
#394
heroes999
closed
1 year ago
6
[Question]When setting use_mixed_precision=True, wdl training does not converge.
#393
zpcalan
closed
1 year ago
22
Redirect master pages to main
#392
alexanderronquillo
closed
1 year ago
0
[Question] Failed to run lookup sparse distribute example
#391
Nov11
closed
1 year ago
1
[Question] Does ETC training feature support to run on multiple physical nodes?
#390
zpcalan
closed
1 year ago
1
[BUG]Can NOT run wdl_parquet.py: CUDNN_STATUS_MAPPING_ERROR
#389
butterluo
closed
11 months ago
6
[Question] SOK - How to save sok.expertiment.Variable correctly into saved model ?
#388
Nov11
closed
1 year ago
2
[Question] Calling 'apply_gradients' on sok.experiment.Variable reports Variable not created in the strategy scope
#387
Nov11
closed
1 year ago
2
[BUG] Failed to process day23 of criteo data.
#386
zpcalan
closed
1 year ago
3
[Question] Having a hard time running demo with tensorflow2 mirrorredstrategy
#385
Nov11
closed
1 year ago
2
[Question] negative item counts in MovieLens notebook
#384
JohnFirth
closed
1 year ago
1
[BUG] cannot run sok demo with official image
#383
ZhuYuJin
closed
1 year ago
5
[BUG] EmbeddingCollection Wgrad buffer sizes can overflow a 32 bit integer.
#382
zpzim
closed
1 year ago
3
[Question] Dose HugeCTR support feature selection & feature elimination ?
#381
wzhgithub
closed
1 year ago
1
Fix UT failure for l2_regularizer_layer
#380
EmmaQiaoCh
closed
1 year ago
1
[Question] How to get the performance of inference
#379
liangxuegang
closed
1 year ago
2
[BUG] Encountered GPU utilization of 100% while using the SparseOperationKit Experiment API.
#378
Acacia124
closed
1 year ago
4
[BUG] Documentation for Optimizer types has a typo.
#377
ashish007git
closed
1 year ago
2
modify EV name
#376
Mesilenceki
closed
1 year ago
1
[BUG] around 200 layer unit tests fail in my hugectr container, pls lend a hand
#375
heroes999
closed
1 year ago
2
[Question] how to not use cuda graph in hugectr?
#374
LucQueen
closed
1 year ago
2
[Question] How to correctly use Embedding Training Cache feature in HugeCTR
#373
yuqie
closed
1 year ago
2
[Question]what is about Segmentation fault when i train dlrm in mlperf?
#372
LucQueen
closed
1 year ago
6
[BUG] Program crashes on garbage collection of inference session / model
#371
yakoton
closed
1 year ago
3
[Question]How can i debug core dump when i use hugectr
#370
LucQueen
closed
1 year ago
1
Fix hps doc typo
#369
yingcanw
closed
11 months ago
1
[BUG] databse
#368
zhaozheng09
closed
1 year ago
1
original error: libcuda.so.1: cannot open shared object file: No such file or directory,a problem occurred in the docker image nvcr.io/nvidia/tensorflow:22.06-tf2-py3
#367
shijiexu09
closed
1 year ago
2
[Question]DIN sample slot_size_array and key range overlap
#366
liguo88
closed
1 year ago
2
[Question] Initialize SOK embedding on CPU to prevent OOM
#365
WonderingWJ
closed
1 year ago
1
[BUG] DIN sample refers to old version of NVTabular and produces error when running w/ 22.09 container
#364
jsohn-nvidia
closed
1 year ago
3
[Requirement] TLS communication for cloud-hosted HPS
#363
Spartee
closed
1 year ago
6
[BUG] HPS tensorflow plugin, multi-gpu example crashes
#362
molamooo
closed
1 year ago
3
[BUG] WDL training notebook for HugeCTR processing workflow fails with TypeError
#361
Spartee
closed
1 year ago
5
[BUG] Criteo example fail due to "Runtime error: file list open failed: ./criteo_data/file_list.txt"
#360
thuningxu
closed
1 year ago
3
Typo:dropoutlayer rate typo
#359
JacoCheung
closed
2 years ago
1
Previous
Next