Oneflow-Inc / DLPerf

DeepLearning Framework Performance Profiling Toolkit
Apache License 2.0
276 stars 27 forks source link

Modify hugectr testing script #148

Closed ccddyy416 closed 2 years ago

ccddyy416 commented 2 years ago

修改hugectr测试脚本,log信息提取文件,wdl.py,readme scripts ├── 300k_iters.sh # 300k iterations test, display loss and auc every 1000 iterations. ├── 500_iters.sh # 500 iterations test, display loss and auc every iteration. ├── bsz_x2.sh # Batch Size Double Test ├── fix_bsz_per_device.sh # test with different number of devices and fixing batch size per device ├── fix_total_bsz.sh # test with different number of devices and fixing total batch size ├── gpu_memory_usage.py # log maximum GPU device memory usage during testing tools ├──extract_hugectr_logs.py # python extract_hugectr_logs.py --benchmark_log_dir log文件存放目录 |──extract_losses_aucs.sh # Usage: $./extract_losses_aucs.sh logfile