One of the more annoying aspects of the profiler is that it runs very slowly compared to non-profiled mode, and even pc-histogram mode. This is a problem, because the profiler is also a good way to get accurate execution timing for HB kernels.
The intent of this PR is to create a "simple stats" file that gets generated during non-profiling runs. All it does is emit the arrival times and tags of statistics packets that arrive at the host interface. This can be parsed (separate script, still in development) to provide accurate timing information of kernels, without the performance hit of the profiler.
My rough estimate is that this is 4-5x faster for long-running kernels. This will be especially helpful for obtaining results in minimal time.
Use to get profiling-like timing information when running in exec mode
One of the more annoying aspects of the profiler is that it runs very slowly compared to non-profiled mode, and even pc-histogram mode. This is a problem, because the profiler is also a good way to get accurate execution timing for HB kernels.
The intent of this PR is to create a "simple stats" file that gets generated during non-profiling runs. All it does is emit the arrival times and tags of statistics packets that arrive at the host interface. This can be parsed (separate script, still in development) to provide accurate timing information of kernels, without the performance hit of the profiler.
My rough estimate is that this is 4-5x faster for long-running kernels. This will be especially helpful for obtaining results in minimal time.