bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
221 stars 58 forks source link

Merge tag tracking logic from Replicant #656

Open drichmond opened 2 years ago

drichmond commented 2 years ago

Supersedes 793

When we run in no-profiling mode, we still get stats packets. This means we're throwing valuable timing information away, for a 4-5x faster runtime. Why not keep it? Why not emit the packet arrival time to a different file (simple_stats.csv), so that we can get faster runtimes but still have timing information? This is useful when we're gathering results, and not iterating.

From BSG Replicant:

One of the more annoying aspects of the profiler is that it runs very slowly compared to non-profiled mode, and even pc-histogram mode. This is a problem, because the profiler is also a good way to get accurate execution timing for HB kernels.

The intent of this PR is to create a "simple stats" file that gets generated during non-profiling runs. All it does is emit the arrival times and tags of statistics packets that arrive at the host interface. This can be parsed (separate script, still in development) to provide accurate timing information of kernels, without the performance hit of the profiler.

My rough estimate is that this is 4-5x faster for long-running kernels. This will be especially helpful for obtaining results in minimal time.

This PR merges the module from BSG Replicant, and the changes from 793 so that we provide the same functionality in both repositories. If just make the change in replicant, nobody will get this benefit in manycore.

In this PR I removed some of the wires from spmd_testbench, and moved them into the bsg_nonsynth_manycore_testbench. Corresponding changes will be made in bsg_replicant.

Two bits of weirdness. One, is that we now refer to some of the global scope wires inside of the testbench:

        ,.print_stat_v_i      ($root.`HOST_MODULE_PATH.testbench.print_stat_v)
        ,.print_stat_tag_i    ($root.`HOST_MODULE_PATH.testbench.print_stat_tag)

Instead of:

        ,.print_stat_v_i      ($root.`HOST_MODULE_PATH.print_stat_v)
        ,.print_stat_tag_i    ($root.`HOST_MODULE_PATH.print_stat_tag)

Second, is that the IO complex no longer tracks stats packets. I figured that the benefits of unified code, and 4-5x improvements mentioned above overcame these drawbacks. But, I'm open to other solutions.

drichmond commented 2 years ago

For reference, when I run test_profiler in bsg_replicant:

Without the profiler: 28 seconds With the profiler: 117 seconds