IO500 / io500

IO500 Storage Benchmark source code
MIT License
99 stars 31 forks source link

Stonewalling: IOR Hard on Lustre and HDD takes extremely long #14

Closed JulianKunkel closed 2 years ago

JulianKunkel commented 3 years ago

Opened this issue to track the situation of IOR hard that takes many hours when running on spinning disks. On a Lustre system such as DKRZ and Archer 2 that has spinning disks, the IOR hard takes basically unbearable long.

Inspecting ior-hard-write.txt for a test with [debug] stonewall-time = 1

For example on 10 nodes with 8 procs each leads to: stonewalling pairs accessed min: 21 max: 3195 -- min data: 0.0 GiB mean data: 0.0 GiB time: 1.2s The overall runtime here was then 16.5s.

The imbalance on 5 minute runs can stretch the runtime further, causing the hard phase to take many hours (often to be killed by 8 hour deadlines).

The only suitable workaround is to reduce the segment count to a bearable number, e.g., [ior-hard] segmentCount = 10000