bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
221 stars 58 forks source link

Faster PC Histogram #644

Closed mrutt92 closed 2 years ago

mrutt92 commented 2 years ago

We have a PC histogram tool currently that parses a cycle-by-cycle trace file from simulation. The functionality is very helpful, but it comes at ~3x cost to simulation time and the trace file can grown to GBs in size for even relatively short simulations.

This PR provides a fast-functional PC histogram. The histogram is stored in memory while simulation is running and dumped to a file at the end of simulation. Some level of detail is lost such as the instruction opcode (instead we see either 'instr' or 'fp_instr'). However, the stall type information is retained.

The overhead from using the fast PC histogram was never more than 20% over fast non-profiling simulation and as much as 3x faster than enabling the trace.

Note that this is unsuitable for blood graph generation. You still need to use tracing for that.

drichmond commented 2 years ago

Very excited to see this merged in

mrutt92 commented 2 years ago

@tommydcjung anything else? remove the macros and then good-to-go?

tommydcjung commented 2 years ago

@tommydcjung anything else? remove the macros and then good-to-go?

Can we remove the macros so we can look at the resulting code? we might find more issues then

mrutt92 commented 2 years ago

@tommydcjung anything else? remove the macros and then good-to-go? Can we remove the macros so we can look at the resulting code? we might find more issues then

Are we done reviewing everything else?

mrutt92 commented 2 years ago

i went ahead and removed the macros.

mrutt92 commented 2 years ago

Ok good to go?

mrutt92 commented 2 years ago

good-to-go?