Open tictooc opened 2 years ago
@tictooc
Thank you very much for testing!
It's overclocked, right? What is its default frequencies? The specs say that GPU has 1400 and 1750 in boost and memory has 1000.
Yes those results were with the GPU highly overclocked. At stock, the core clock boosts to 1775-1800MHZ and the memory speed is 1000MHz.
Here is a run at stock clocks for comparison. That should fall somewhere right around the expected results on an MI50, since these are all single precision benchmarks. The only change from stock is to set the perf level to high to minimize the noise from the somewhat inconsistent boost algorithm on Vega 20. Average clocks during the below benchmark run were 1770-1790MHz.
OS: Arch Linux \ Kernel: 5.16.15 \ ROCm Version: 5.0.2 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz \ GPU: AMD Radeon VII @ 1750core|1000mem
Test | OpenCl (ns/day) | HIP (ns/day) | Performance Improvement |
---|---|---|---|
gbsa | 606 | 1101.6 | 82% |
rf | 273.7 | 1016.5 | 265% |
pme | 228.8 | 679.1 | 197% |
apoa1rf | 86.1 | 313.9 | 265% |
apoa1pme | 71.7 | 210.4 | 193% |
apoa1ljpme | 68.2 | 162.4 | 138% |
amoebagk | 1.1 | 17.6 | 1500% |
amoebapme | 7.3 | 8.2 | 12% |
amber20-dhfr | 252.6 | 683.9 | 171% |
amber20-cellulose | 14.8 | 55.4 | 274% |
amber20-stmv | 4.8 | 21.7 | 352% |
Additional test results with a 6900XT. Improvements are even greater than on the Radeon VII. The 6900XT was tested at default clocks, with only a change in fan speed and setting power to 293W. This did allow the GPU to run at a higher boost clock on the HIP tests, which bump right up against the power limit when it is set at 293W.
1 failed test on test_openmm_hip.sh Failed test #32
OS: Arch Linux \ Kernel: 5.16.0-1-amd-staging-drm-next-git-02007-g8bb14fbec5ae \ ROCm Version: 5.1.1 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen 3200G \ GPU: AMD Radeon 6900XT @ 2575core(OpenCL) 2505core(avg HIP)|2000mem
Test | OpenCl (ns/day) | HIP (ns/day) | Performance Improvement |
---|---|---|---|
gbsa | 860.9 | 1869.8 | 117% |
rf | 357.6 | 1517.6 | 324% |
pme | 270 | 1063.2 | 293% |
apoa1rf | 123.7 | 566 | 360% |
apoa1pme | 107.1 | 397.7 | 271% |
apoa1ljpme | 100.5 | 323.8 | 222% |
amoebagk | 2.4 | 39.9 | 1563% |
amoebapme | 11.7 | 17.4 | 49% |
amber20-dhfr | 276.7 | 1075.6 | 289% |
amber20-cellulose | 24.7 | 97.9 | 296% |
amber20-stmv | 8 | 34.5 | 331% |
Thank you, @tictooc!
It's interesting, the hipFFT test fails on RDNA. We added this test because we encountered correctness issues for some FFT sizes on older versions of rocFFT. And now it happens again. I guess we'll need to investigate it further and report to rocFFT developers.
The same test fails identically on Vega 20 (at least on the Radeon VII) running ROCm 5.1.3. I'll roll back to ROCm 5.0.2, and see if I can find the regression.
--Edit-- The fft test on the older version of ROCm went through a few different progressions, but ultimately was able to pass.
This is not an issue, just wanted to report that the conda version of this plugin along with the StreamHPC/openmm fork are working without issue on ROCm 5.0.2 and the the latest stable kernel(5.16.15).
test_openmm_hip.sh passes all tests.
Test Results
``` #1: TestHipAmoebaExtrapolatedPolarization Done #2: TestHipAmoebaGeneralizedKirkwoodForce Done #3: TestHipAmoebaMultipoleForce Done #4: TestHipAmoebaTorsionTorsionForce Done #5: TestHipAmoebaVdwForce Done #6: TestHipAndersenThermostat Done #7: TestHipBrownianIntegrator Done #8: TestHipCheckpoints Done #9: TestHipCMAPTorsionForce Done #10: TestHipCMMotionRemover Done #11: TestHipCompiler Done #12: TestHipCompoundIntegrator Done #13: TestHipCustomAngleForce Done #14: TestHipCustomBondForce Done #15: TestHipCustomCentroidBondForce Done #16: TestHipCustomCompoundBondForce Done #17: TestHipCustomCVForce Done #18: TestHipCustomExternalForce Done #19: TestHipCustomGBForce Done #20: TestHipCustomHbondForce Done #21: TestHipCustomIntegrator Done #22: TestHipCustomManyParticleForce Done #23: TestHipCustomNonbondedForce Done #24: TestHipCustomTorsionForce Done #25: TestHipDispersionPME Done #26: TestHipDrudeForce Done #27: TestHipDrudeLangevinIntegrator Done #28: TestHipDrudeNoseHoover Done #29: TestHipDrudeSCFIntegrator Done #30: TestHipEwald Done #31: TestHipFFTImplFFT3D Done #32: TestHipFFTImplHipFFT realToComplex: 0 xsize: 28 ysize: 25 zsize: 30 realToComplex: 1 xsize: 28 ysize: 25 zsize: 25 realToComplex: 1 xsize: 25 ysize: 28 zsize: 25 realToComplex: 1 xsize: 25 ysize: 25 zsize: 28 realToComplex: 1 xsize: 21 ysize: 25 zsize: 27 realToComplex: 1 xsize: 49 ysize: 98 zsize: 14 realToComplex: 1 xsize: 7 ysize: 21 zsize: 98 realToComplex: 1 xsize: 98 ysize: 21 zsize: 21 realToComplex: 1 xsize: 18 ysize: 98 zsize: 6 realToComplex: 1 xsize: 98 ysize: 98 zsize: 98 Done #33: TestHipFFTImplVkFFT Done #34: TestHipGayBerneForce Done #35: TestHipGBSAOBCForce Done #36: TestHipHarmonicAngleForce Done #37: TestHipHarmonicBondForce Done #38: TestHipHippoNonbondedForce Done #39: TestHipLangevinIntegrator Done #40: TestHipLangevinMiddleIntegrator Done #41: TestHipLocalEnergyMinimizer Done #42: TestHipMonteCarloAnisotropicBarostat Done #43: TestHipMonteCarloBarostat Done #44: TestHipMonteCarloFlexibleBarostat Done #45: TestHipMultipleForces Done #46: TestHipNonbondedForce Done #47: TestHipNoseHooverIntegrator Done #48: TestHipPeriodicTorsionForce Done #49: TestHipRandom Done #50: TestHipRBTorsionForce Done #51: TestHipRMSDForce Done #52: TestHipRpmd Done #53: TestHipSettle Done #54: TestHipSort Done #55: TestHipVariableLangevinIntegrator Done #56: TestHipVariableVerletIntegrator Done #57: TestHipVerletIntegrator Done #58: TestHipVirtualSites Done #59: TestHipWcaDispersionForce Done ---------------- All tests passed ---------------- ```All of the benchmarks except for the amber20-factorix (upstream issue #3391) benchmark run without issue. benchmark.py output using the draft benchmark.py #3386 with a few local changes for HIP and system info
Full benchmark.py output
``` $ python benchmark_new_hip.py --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=OpenCL,HIP timestamp: 2022-03-20T17:05:34.618117 openmm_version: 7.7.0.dev-ce22dbe cpuinfo: AMD Ryzen Threadripper 3960X 24-Core Processor system: Linux kernel: 5.16.15-arch1-1-lean gpu: Vega 20 [Radeon VII] test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 58331 elapsed_time: 30.055014 ns_per_day: 670.74311128253 test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 107412 elapsed_time: 30.014081 ns_per_day: 1236.8057246197206 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 26619 elapsed_time: 29.868414 ns_per_day: 308.00183766034576 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 99106 elapsed_time: 29.946284 ns_per_day: 1143.749040782489 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 22134 elapsed_time: 29.426031 ns_per_day: 259.95726029106675 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 66469 elapsed_time: 29.941122 ns_per_day: 767.2286429346234 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 8716 elapsed_time: 30.673099 ns_per_day: 98.20493195030602 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 31438 elapsed_time: 30.389406 ns_per_day: 357.5250138156698 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 7624 elapsed_time: 32.082976 ns_per_day: 82.1262466424561 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 21071 elapsed_time: 30.273105 ns_per_day: 240.54809045851087 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 7131 elapsed_time: 31.659479 ns_per_day: 77.84315086170558 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 16211 elapsed_time: 30.193642 ns_per_day: 185.55302470632722 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 191 elapsed_time: 27.422316 ns_per_day: 1.2035744902071728 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 3344 elapsed_time: 28.809248 ns_per_day: 20.05755929484865 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1383 elapsed_time: 28.796577 ns_per_day: 8.298986369109077 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 1579 elapsed_time: 28.977509 ns_per_day: 9.41596463657383 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 25188 elapsed_time: 30.44635 ns_per_day: 285.91186792505505 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 67339 elapsed_time: 30.016727 ns_per_day: 775.3129913198063 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1521 elapsed_time: 31.051195 ns_per_day: 16.928739779580134 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 5463 elapsed_time: 29.788209 ns_per_day: 63.381212344790505 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 480 elapsed_time: 29.724693 ns_per_day: 5.580814577294373 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 2196 elapsed_time: 30.422483 ns_per_day: 24.946602813452134 ```OpenCL vs HIP Performance Summary
System
OS: Arch Linux \ Kernel: 5.16.15 \ ROCm Version: 5.0.2 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz \ GPU: AMD Radeon VII @ 2120core|1200mem