Oneflow-Inc / models

Models and examples built with OneFlow
Apache License 2.0
94 stars 37 forks source link

Dlrm benchmark test #375

Open ShawnXuan opened 2 years ago

ShawnXuan commented 2 years ago

dlrm benchmark test scripts

ShawnXuan commented 2 years ago

关于下面这些选项:

export CUDA_DEVICE_MAX_CONNECTIONS=32
export ONEFLOW_EP_CUDA_STREAM_FLAGS=1
export ONEFLOW_RAW_READER_PREFETCHING_QUEUE_DEPTH=512
export ONEFLOW_RAW_READER_NUM_WORKERS=1

export LD_PRELOAD=/usr/lib64/libjemalloc.so.1

numactl --interleave=all \
做了一组实验,记录了74000轮的平均latency(ms)结果如下: ON OFF
1.41855692 1.44409019
1.42942288 1.43027312
1.42626776 1.43327031
1.43100398 1.43726633
1.43247646 1.43108837
1.43085669 1.4360571
1.4250376 1.43052549
1.4246417 1.44208097
1.42638928 1.43673026
1.43390266 1.43774178
1.42238418 1.43597748
1.43701162 1.43563187
1.42529816 1.43994857
1.42365005 1.43631018
1.43174504 1.43489774
1.42973357 1.43393828
1.4347752  
1.43040477  
统计结果如下:   ON OFF
mean 1.4285 1.4360
max 1.4370 1.4441
min 1.4186 1.4303
std 0.0048 0.0039

都打开的时候有8us左右的提升,其实很微小,先不保留这些选项。