StreamHPC / openmm-hip-old

6 stars 3 forks source link

openmm-hip Testing and Benchmarks #1

Open tictooc opened 2 years ago

tictooc commented 2 years ago

This is not an issue, just wanted to report that the conda version of this plugin along with the StreamHPC/openmm fork are working without issue on ROCm 5.0.2 and the the latest stable kernel(5.16.15).

test_openmm_hip.sh passes all tests.

Test Results ``` #1: TestHipAmoebaExtrapolatedPolarization Done #2: TestHipAmoebaGeneralizedKirkwoodForce Done #3: TestHipAmoebaMultipoleForce Done #4: TestHipAmoebaTorsionTorsionForce Done #5: TestHipAmoebaVdwForce Done #6: TestHipAndersenThermostat Done #7: TestHipBrownianIntegrator Done #8: TestHipCheckpoints Done #9: TestHipCMAPTorsionForce Done #10: TestHipCMMotionRemover Done #11: TestHipCompiler Done #12: TestHipCompoundIntegrator Done #13: TestHipCustomAngleForce Done #14: TestHipCustomBondForce Done #15: TestHipCustomCentroidBondForce Done #16: TestHipCustomCompoundBondForce Done #17: TestHipCustomCVForce Done #18: TestHipCustomExternalForce Done #19: TestHipCustomGBForce Done #20: TestHipCustomHbondForce Done #21: TestHipCustomIntegrator Done #22: TestHipCustomManyParticleForce Done #23: TestHipCustomNonbondedForce Done #24: TestHipCustomTorsionForce Done #25: TestHipDispersionPME Done #26: TestHipDrudeForce Done #27: TestHipDrudeLangevinIntegrator Done #28: TestHipDrudeNoseHoover Done #29: TestHipDrudeSCFIntegrator Done #30: TestHipEwald Done #31: TestHipFFTImplFFT3D Done #32: TestHipFFTImplHipFFT realToComplex: 0 xsize: 28 ysize: 25 zsize: 30 realToComplex: 1 xsize: 28 ysize: 25 zsize: 25 realToComplex: 1 xsize: 25 ysize: 28 zsize: 25 realToComplex: 1 xsize: 25 ysize: 25 zsize: 28 realToComplex: 1 xsize: 21 ysize: 25 zsize: 27 realToComplex: 1 xsize: 49 ysize: 98 zsize: 14 realToComplex: 1 xsize: 7 ysize: 21 zsize: 98 realToComplex: 1 xsize: 98 ysize: 21 zsize: 21 realToComplex: 1 xsize: 18 ysize: 98 zsize: 6 realToComplex: 1 xsize: 98 ysize: 98 zsize: 98 Done #33: TestHipFFTImplVkFFT Done #34: TestHipGayBerneForce Done #35: TestHipGBSAOBCForce Done #36: TestHipHarmonicAngleForce Done #37: TestHipHarmonicBondForce Done #38: TestHipHippoNonbondedForce Done #39: TestHipLangevinIntegrator Done #40: TestHipLangevinMiddleIntegrator Done #41: TestHipLocalEnergyMinimizer Done #42: TestHipMonteCarloAnisotropicBarostat Done #43: TestHipMonteCarloBarostat Done #44: TestHipMonteCarloFlexibleBarostat Done #45: TestHipMultipleForces Done #46: TestHipNonbondedForce Done #47: TestHipNoseHooverIntegrator Done #48: TestHipPeriodicTorsionForce Done #49: TestHipRandom Done #50: TestHipRBTorsionForce Done #51: TestHipRMSDForce Done #52: TestHipRpmd Done #53: TestHipSettle Done #54: TestHipSort Done #55: TestHipVariableLangevinIntegrator Done #56: TestHipVariableVerletIntegrator Done #57: TestHipVerletIntegrator Done #58: TestHipVirtualSites Done #59: TestHipWcaDispersionForce Done ---------------- All tests passed ---------------- ```

All of the benchmarks except for the amber20-factorix (upstream issue #3391) benchmark run without issue. benchmark.py output using the draft benchmark.py #3386 with a few local changes for HIP and system info

Full benchmark.py output ``` $ python benchmark_new_hip.py --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=OpenCL,HIP timestamp: 2022-03-20T17:05:34.618117 openmm_version: 7.7.0.dev-ce22dbe cpuinfo: AMD Ryzen Threadripper 3960X 24-Core Processor system: Linux kernel: 5.16.15-arch1-1-lean gpu: Vega 20 [Radeon VII] test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 58331 elapsed_time: 30.055014 ns_per_day: 670.74311128253 test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 107412 elapsed_time: 30.014081 ns_per_day: 1236.8057246197206 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 26619 elapsed_time: 29.868414 ns_per_day: 308.00183766034576 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 99106 elapsed_time: 29.946284 ns_per_day: 1143.749040782489 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 22134 elapsed_time: 29.426031 ns_per_day: 259.95726029106675 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 66469 elapsed_time: 29.941122 ns_per_day: 767.2286429346234 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 8716 elapsed_time: 30.673099 ns_per_day: 98.20493195030602 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 31438 elapsed_time: 30.389406 ns_per_day: 357.5250138156698 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 7624 elapsed_time: 32.082976 ns_per_day: 82.1262466424561 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 21071 elapsed_time: 30.273105 ns_per_day: 240.54809045851087 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 7131 elapsed_time: 31.659479 ns_per_day: 77.84315086170558 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 16211 elapsed_time: 30.193642 ns_per_day: 185.55302470632722 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 191 elapsed_time: 27.422316 ns_per_day: 1.2035744902071728 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 3344 elapsed_time: 28.809248 ns_per_day: 20.05755929484865 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1383 elapsed_time: 28.796577 ns_per_day: 8.298986369109077 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 1579 elapsed_time: 28.977509 ns_per_day: 9.41596463657383 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 25188 elapsed_time: 30.44635 ns_per_day: 285.91186792505505 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 67339 elapsed_time: 30.016727 ns_per_day: 775.3129913198063 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1521 elapsed_time: 31.051195 ns_per_day: 16.928739779580134 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 5463 elapsed_time: 29.788209 ns_per_day: 63.381212344790505 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 480 elapsed_time: 29.724693 ns_per_day: 5.580814577294373 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 2196 elapsed_time: 30.422483 ns_per_day: 24.946602813452134 ```

OpenCL vs HIP Performance Summary

System

OS: Arch Linux \ Kernel: 5.16.15 \ ROCm Version: 5.0.2 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz \ GPU: AMD Radeon VII @ 2120core|1200mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 670.7 1236.8 84%
rf 308.0 1143.7 271%
pme 260.0 767.2 195%
apoa1rf 98.2 357.5 264%
apoa1pme 82.1 240.6 193%
apoa1ljpme 77.8 185.6 138%
amoebagk 1.2 20.1 1567%
amoebapme 8.3 9.4 13%
amber20-dhfr 285.9 775.3 171%
amber20-cellulose 16.9 63.4 275%
amber20-stmv 5.6 24.9 347%
ex-rzr commented 2 years ago

@tictooc

Thank you very much for testing!

It's overclocked, right? What is its default frequencies? The specs say that GPU has 1400 and 1750 in boost and memory has 1000.

tictooc commented 2 years ago

Yes those results were with the GPU highly overclocked. At stock, the core clock boosts to 1775-1800MHZ and the memory speed is 1000MHz.

Here is a run at stock clocks for comparison. That should fall somewhere right around the expected results on an MI50, since these are all single precision benchmarks. The only change from stock is to set the perf level to high to minimize the noise from the somewhat inconsistent boost algorithm on Vega 20. Average clocks during the below benchmark run were 1770-1790MHz.

Full benchmark.py output ``` $ python benchmark_new_hip.py --outfile bench_opencl-HIP_stock.json --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=OpenCL,HIP --device=2 timestamp: 2022-03-21T11:18:49.514128 openmm_version: 7.7.0.dev-ce22dbe cpuinfo: AMD Ryzen Threadripper 3960X 24-Core Processor system: Linux kernel: 5.16.15-arch1-1-lean gpu: Vega 20 [Radeon VII] test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 50746 elapsed_time: 28.94034 ns_per_day: 605.9990172886703 test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 95612 elapsed_time: 29.996725 ns_per_day: 1101.570494779013 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 23620 elapsed_time: 29.824848 ns_per_day: 273.7003722533639 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 88283 elapsed_time: 30.015825 ns_per_day: 1016.4839647086161 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 19720 elapsed_time: 29.78403 ns_per_day: 228.8216873270675 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 59097 elapsed_time: 30.076975 ns_per_day: 679.0550977949079 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 7673 elapsed_time: 30.797974 ns_per_day: 86.1027027297315 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 27781 elapsed_time: 30.581991 ns_per_day: 313.94664918971426 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 6700 elapsed_time: 32.295395 ns_per_day: 71.69814767709141 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 18394 elapsed_time: 30.219251 ns_per_day: 210.3614811631168 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 6333 elapsed_time: 32.101047 ns_per_day: 68.18110325186588 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 14249 elapsed_time: 30.316464 ns_per_day: 162.43498582156548 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 166 elapsed_time: 27.44593 ns_per_day: 1.0451385688151211 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 2955 elapsed_time: 29.091936 ns_per_day: 17.552080411561466 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1161 elapsed_time: 27.578498 ns_per_day: 7.2745368511367055 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 1377 elapsed_time: 28.98677 ns_per_day: 8.208765585127281 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 21875 elapsed_time: 29.927539 ns_per_day: 252.61014612661594 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 59507 elapsed_time: 30.070277 ns_per_day: 683.9185152833809 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 1288 elapsed_time: 30.104517 ns_per_day: 14.786246196874705 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 4757 elapsed_time: 29.699142 ns_per_day: 55.35578098518804 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 418 elapsed_time: 29.779175 ns_per_day: 4.851067902317642 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 1902 elapsed_time: 30.237341 ns_per_day: 21.739054369893168 ```

OpenCL vs HIP Performance Summary

System

OS: Arch Linux \ Kernel: 5.16.15 \ ROCm Version: 5.0.2 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz \ GPU: AMD Radeon VII @ 1750core|1000mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 606 1101.6 82%
rf 273.7 1016.5 265%
pme 228.8 679.1 197%
apoa1rf 86.1 313.9 265%
apoa1pme 71.7 210.4 193%
apoa1ljpme 68.2 162.4 138%
amoebagk 1.1 17.6 1500%
amoebapme 7.3 8.2 12%
amber20-dhfr 252.6 683.9 171%
amber20-cellulose 14.8 55.4 274%
amber20-stmv 4.8 21.7 352%
tictooc commented 2 years ago

Additional test results with a 6900XT. Improvements are even greater than on the Radeon VII. The 6900XT was tested at default clocks, with only a change in fan speed and setting power to 293W. This did allow the GPU to run at a higher boost clock on the HIP tests, which bump right up against the power limit when it is set at 293W.

1 failed test on test_openmm_hip.sh Failed test #32

Test Results ``` tictoc@TickTockMedia $ ./test_openmm_hip.sh #1: TestHipAmoebaExtrapolatedPolarization Done #2: TestHipAmoebaGeneralizedKirkwoodForce Done #3: TestHipAmoebaMultipoleForce Done #4: TestHipAmoebaTorsionTorsionForce Done #5: TestHipAmoebaVdwForce Done #6: TestHipAndersenThermostat Done #7: TestHipBrownianIntegrator Done #8: TestHipCheckpoints Done #9: TestHipCMAPTorsionForce Done #10: TestHipCMMotionRemover Done #11: TestHipCompiler Done #12: TestHipCompoundIntegrator Done #13: TestHipCustomAngleForce Done #14: TestHipCustomBondForce Done #15: TestHipCustomCentroidBondForce Done #16: TestHipCustomCompoundBondForce Done #17: TestHipCustomCVForce Done #18: TestHipCustomExternalForce Done #19: TestHipCustomGBForce Done #20: TestHipCustomHbondForce Done #21: TestHipCustomIntegrator Done #22: TestHipCustomManyParticleForce Done #23: TestHipCustomNonbondedForce Done #24: TestHipCustomTorsionForce Done #25: TestHipDispersionPME Done #26: TestHipDrudeForce Done #27: TestHipDrudeLangevinIntegrator Done #28: TestHipDrudeNoseHoover Done #29: TestHipDrudeSCFIntegrator Done #30: TestHipEwald Done #31: TestHipFFTImplFFT3D Done #32: TestHipFFTImplHipFFT realToComplex: 0 xsize: 28 ysize: 25 zsize: 30 realToComplex: 1 xsize: 28 ysize: 25 zsize: 25 realToComplex: 1 xsize: 25 ysize: 28 zsize: 25 realToComplex: 1 xsize: 25 ysize: 25 zsize: 28 realToComplex: 1 xsize: 21 ysize: 25 zsize: 27 realToComplex: 1 xsize: 49 ysize: 98 zsize: 14 realToComplex: 1 xsize: 7 ysize: 21 zsize: 98 realToComplex: 1 xsize: 98 ysize: 21 zsize: 21 realToComplex: 1 xsize: 18 ysize: 98 zsize: 6 realToComplex: 1 xsize: 98 ysize: 98 zsize: 98 exception: Error executing hipFFT: 6 realToComplex: 0 xsize: 28 ysize: 25 zsize: 30 realToComplex: 1 xsize: 28 ysize: 25 zsize: 25 realToComplex: 1 xsize: 25 ysize: 28 zsize: 25 realToComplex: 1 xsize: 25 ysize: 25 zsize: 28 realToComplex: 1 xsize: 21 ysize: 25 zsize: 27 realToComplex: 1 xsize: 49 ysize: 98 zsize: 14 realToComplex: 1 xsize: 7 ysize: 21 zsize: 98 realToComplex: 1 xsize: 98 ysize: 21 zsize: 21 realToComplex: 1 xsize: 18 ysize: 98 zsize: 6 realToComplex: 1 xsize: 98 ysize: 98 zsize: 98 exception: Error executing hipFFT: 6 realToComplex: 0 xsize: 28 ysize: 25 zsize: 30 realToComplex: 1 xsize: 28 ysize: 25 zsize: 25 realToComplex: 1 xsize: 25 ysize: 28 zsize: 25 realToComplex: 1 xsize: 25 ysize: 25 zsize: 28 realToComplex: 1 xsize: 21 ysize: 25 zsize: 27 realToComplex: 1 xsize: 49 ysize: 98 zsize: 14 realToComplex: 1 xsize: 7 ysize: 21 zsize: 98 realToComplex: 1 xsize: 98 ysize: 21 zsize: 21 realToComplex: 1 xsize: 18 ysize: 98 zsize: 6 realToComplex: 1 xsize: 98 ysize: 98 zsize: 98 exception: Error executing hipFFT: 6 #33: TestHipFFTImplVkFFT Done #34: TestHipGayBerneForce Done #35: TestHipGBSAOBCForce Done #36: TestHipHarmonicAngleForce Done #37: TestHipHarmonicBondForce Done #38: TestHipHippoNonbondedForce Done #39: TestHipLangevinIntegrator Done #40: TestHipLangevinMiddleIntegrator Done #41: TestHipLocalEnergyMinimizer Done #42: TestHipMonteCarloAnisotropicBarostat Done #43: TestHipMonteCarloBarostat Done #44: TestHipMonteCarloFlexibleBarostat Done #45: TestHipMultipleForces Done #46: TestHipNonbondedForce Done #47: TestHipNoseHooverIntegrator Done #48: TestHipPeriodicTorsionForce Done #49: TestHipRandom Done #50: TestHipRBTorsionForce Done #51: TestHipRMSDForce Done #52: TestHipRpmd Done #53: TestHipSettle Done #54: TestHipSort Done #55: TestHipVariableLangevinIntegrator Done #56: TestHipVariableVerletIntegrator Done #57: TestHipVerletIntegrator Done #58: TestHipVirtualSites Done #59: TestHipWcaDispersionForce Done ------------ Failed tests ------------ #32 TestHipFFTImplHipFFT ```
Full benchmark.py output ``` $ python benchmark_new_hip.py --outfile bench_opencl-hip_6900XT_amdStagingKernel.json --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=HIP,OpenCL timestamp: 2022-05-24T00:02:23.087467 openmm_version: 7.7.0.dev-ce22dbe cpuinfo: AMD Ryzen 3 3200G with Radeon Vega Graphics system: Linux kernel: 5.16.0-1-amd-staging-drm-next-git-02007-g8bb14fbec5ae gpu: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 161892 elapsed_time: 29.922807 ns_per_day: 1869.8070404958999 test: gbsa constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 75024 elapsed_time: 30.117439 ns_per_day: 860.9063473159187 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 131862 elapsed_time: 30.028146 ns_per_day: 1517.626402908791 test: rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 31242 elapsed_time: 30.19554 ns_per_day: 357.5771521224657 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 92186 elapsed_time: 29.964265 ns_per_day: 1063.249227037606 test: pme cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 22985 elapsed_time: 29.415364 ns_per_day: 270.04989637388127 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 49923 elapsed_time: 30.482591 ns_per_day: 566.0079486025317 test: apoa1rf constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 11061 elapsed_time: 30.899304 ns_per_day: 123.71416521226494 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 35015 elapsed_time: 30.424812 ns_per_day: 397.74063353292036 test: apoa1pme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 9571 elapsed_time: 30.878581 ns_per_day: 107.1207773440107 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 28515 elapsed_time: 30.431483 ns_per_day: 323.83515453387525 test: apoa1ljpme constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 8888 elapsed_time: 30.55586 ns_per_day: 100.527126384268 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 6745 elapsed_time: 29.206119 ns_per_day: 39.90725368201093 test: amoebagk epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 382 elapsed_time: 27.458828 ns_per_day: 2.403948194729942 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 2982 elapsed_time: 29.458663 ns_per_day: 17.491954743499388 test: amoebapme epsilon: 1e-05 constraints: None hydrogen_mass: 1 ensemble: NVT timestep_in_fs: 2.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 2000 elapsed_time: 29.5524 ns_per_day: 11.694481666463634 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 93363 elapsed_time: 29.9992 ns_per_day: 1075.57044187845 test: amber20-dhfr cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 24060 elapsed_time: 30.048 ns_per_day: 276.7284345047923 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 8601 elapsed_time: 30.356126 ns_per_day: 97.92111154104445 test: amber20-cellulose cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 2167 elapsed_time: 30.371548 ns_per_day: 24.658446780519714 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: HIP platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'} steps: 3101 elapsed_time: 31.046426 ns_per_day: 34.51945161095193 test: amber20-stmv cutoff: 0.9 constraints: HBonds hydrogen_mass: 1.5 ensemble: NVT timestep_in_fs: 4.0 precision: single platform: OpenCL platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'} steps: 687 elapsed_time: 29.641908 ns_per_day: 8.009848758723626 ```

OpenCL vs HIP Performance Summary

System

OS: Arch Linux \ Kernel: 5.16.0-1-amd-staging-drm-next-git-02007-g8bb14fbec5ae \ ROCm Version: 5.1.1 \ OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be \ CPU: AMD Ryzen 3200G \ GPU: AMD Radeon 6900XT @ 2575core(OpenCL) 2505core(avg HIP)|2000mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 860.9 1869.8 117%
rf 357.6 1517.6 324%
pme 270 1063.2 293%
apoa1rf 123.7 566 360%
apoa1pme 107.1 397.7 271%
apoa1ljpme 100.5 323.8 222%
amoebagk 2.4 39.9 1563%
amoebapme 11.7 17.4 49%
amber20-dhfr 276.7 1075.6 289%
amber20-cellulose 24.7 97.9 296%
amber20-stmv 8 34.5 331%
ex-rzr commented 2 years ago

Thank you, @tictooc!

It's interesting, the hipFFT test fails on RDNA. We added this test because we encountered correctness issues for some FFT sizes on older versions of rocFFT. And now it happens again. I guess we'll need to investigate it further and report to rocFFT developers.

tictooc commented 2 years ago

The same test fails identically on Vega 20 (at least on the Radeon VII) running ROCm 5.1.3. I'll roll back to ROCm 5.0.2, and see if I can find the regression.

--Edit-- The fft test on the older version of ROCm went through a few different progressions, but ultimately was able to pass.