I may be beneficial to provide timing for init_arrays and read_arrays.
This is useful for measuring migration performance of USM models.
In the extreme case, a page migration heuristic that pins data on the device and never migrates to the host will show normal bandwidth for the five kernels but the benchmark will take considerably longer to actually complete.
Most of the time will be spent on copying between host and device (init_arrays and read_arrays).
I may be beneficial to provide timing for
init_arrays
andread_arrays
. This is useful for measuring migration performance of USM models.In the extreme case, a page migration heuristic that pins data on the device and never migrates to the host will show normal bandwidth for the five kernels but the benchmark will take considerably longer to actually complete. Most of the time will be spent on copying between host and device (
init_arrays
andread_arrays
).