修改hugectr测试脚本,log信息提取文件,wdl.py,readme
scripts
├── 300k_iters.sh # 300k iterations test, display loss and auc every 1000 iterations.
├── 500_iters.sh # 500 iterations test, display loss and auc every iteration.
├── bsz_x2.sh # Batch Size Double Test
├── fix_bsz_per_device.sh # test with different number of devices and fixing batch size per device
├── fix_total_bsz.sh # test with different number of devices and fixing total batch size
├── gpu_memory_usage.py # log maximum GPU device memory usage during testing
tools
├──extract_hugectr_logs.py # python extract_hugectr_logs.py --benchmark_log_dir log文件存放目录
|──extract_losses_aucs.sh # Usage: $./extract_losses_aucs.sh logfile
修改hugectr测试脚本,log信息提取文件,wdl.py,readme scripts ├── 300k_iters.sh # 300k iterations test, display loss and auc every 1000 iterations. ├── 500_iters.sh # 500 iterations test, display loss and auc every iteration. ├── bsz_x2.sh # Batch Size Double Test ├── fix_bsz_per_device.sh # test with different number of devices and fixing batch size per device ├── fix_total_bsz.sh # test with different number of devices and fixing total batch size ├── gpu_memory_usage.py # log maximum GPU device memory usage during testing tools ├──extract_hugectr_logs.py # python extract_hugectr_logs.py --benchmark_log_dir log文件存放目录 |──extract_losses_aucs.sh # Usage: $./extract_losses_aucs.sh logfile