Closed ptheywood closed 2 years ago
Not adding Resenet benchmark results to the WMLCE / OpenCE documentation:
ddlrun
errors on RHEL 8 instalaltions of WMLCE, so the benchmark is not useful for comparing RHEL 7 / RHEL 8 tensorflow performancetensorflow-benchmarks
has an IBM licence that does not look like allows redistribution outside of WMLCE, so not useful for direct porting to Open-CE based multi-node tensorflow. It would be better to find more open benchmarking if wishing to compare, but users have no option but to migrate for security purposes anyway.It may be nice to add some general DL benchmarking to compare against x86+V100 systems to support encouraging usesrs onto Bede, but that can become a future issue rather than blocking WMLCE/OpenCE clarrification.
The Resenet50 benchmark job scripts are no longer usable on bede, as
/opt/software/apps/anaconda3
does not exist.Additionally, moving to RHEL8 where WMLCE is not supported (instead replaced by OpenCE) it is unclear if
ddlrun
and thereforebede-ddlrun
will be usable.It may be worth re-benchmarking RESNET50 prior to the RHEL8 switch so we know the performance impact of WMLCE vs OpenCE?
I.e. run RESNET50 at a number of scales (1, 2, 4, 8, 12?, 16? GPUs, current docs say no need to go larger) with:
The current RHEL8 testing partition only conatisn 2 nodes, so only up to 8 GPUs will currently be usable for RHEL8.
This is closely related to #63