malloc(buffer, iosize)
memset(buff, rank, iosize)
start_timer()
if barrier_before_open barrier()
open_file()
if barrier_after_open barrier()
end_timer()
For large IO sizes, the malloc and the memset can take a while AND they can be highly variable. Therefore the time difference between the slowest rank to malloc/memset and the fastest rank to malloc/memset shows up in our open time.
The fix is easy: just add a barrier before the start_timer. By the way, I figured this out because the slow open can hurt the effective write bandwidth if you're not measuring with huge amounts of data.
The fs_test code does this basically:
malloc(buffer, iosize) memset(buff, rank, iosize) start_timer() if barrier_before_open barrier() open_file() if barrier_after_open barrier() end_timer()
For large IO sizes, the malloc and the memset can take a while AND they can be highly variable. Therefore the time difference between the slowest rank to malloc/memset and the fastest rank to malloc/memset shows up in our open time.
The fix is easy: just add a barrier before the start_timer. By the way, I figured this out because the slow open can hurt the effective write bandwidth if you're not measuring with huge amounts of data.