ChampSim / ChampSim

ChampSim is an open-source trace based simulator maintained at Texas A&M University and through the support of the computer architecture community.
https://champsim.github.io/ChampSim/
Apache License 2.0
460 stars 390 forks source link

Trace Generator #351

Open GinoAC opened 1 year ago

GinoAC commented 1 year ago

The goal of this feature is to allow for ChampSim to generate its own traces based on a set of parameters defined at runtime. This would mainly be beneficial for testing modules and rapid prototyping.

ngober commented 10 months ago

I've had a good think about this one. I think writing a trace generator is probably extremely valuable in the long run. There is an idea in software testing called fuzzing where you write a test that varies its input with certain parameters in order to tease out bugs that are hard to see normally.

One example is the FuzzTest framework: https://github.com/google/fuzztest

This can serve as a starting point for a trace generator. It starts as an integration test: we feed the simulator randomized traces (that may or may not be parameterized), and we wait for ChampSim to crash. If it does not, that increases our confidence in its correctness. As our capabilities grow, we can extend this to well-structured parameterizations, and eventually I think the fuzz testing and the trace generation could begin to look pretty similar.

GinoAC commented 10 months ago

I think that's a good game plan for this moving forward. Do you think there are any parameters required for the basic trace gen/fuzzer?

I believe I have a prototype trace generator that just takes in basic block size and data contiguity. I'll dig through it and see if there's anything useful I can push as a starting point.

ngober commented 10 months ago

We can parameterize the fuzz tester if we want, but a basic fuzz test is entirely "The program does not catastrophically fail when given completely random inputs". From there, we can say "The program doesn't fail when the input is bounded in X way" or "The program raises Y error when the input is bounded in Z way". Later on, we can get to "The data accesses fit within the L1 cache", or other particular things.

GinoAC commented 10 months ago

I agree, this is a good starting point. I was thinking more that the fuzz won't have a high coverage if we have purely random accesses since it will reduce collisions in specific structures (e.g. We'll likely reach the MSHR capacity but it might be unlikely to trigger an MSHR hit).

ngober commented 10 months ago

That's true, but we also already have a test that performs MSHR hits. The point of fuzz testing is to illuminate weird cases that your tests don't already cover.

GinoAC commented 10 months ago

That's partially my point, we need these events to occur simultaneously during fuzzing. I've played with fuzzing before and if we don't have some direction to it to incur the weird corner cases, it can potentially diverge into just random accesses that don't place strain on multiple structures at the same time. I guess I personally lean more towards mutation based fuzzing though.

ngober commented 10 months ago

Ah, I see your point. There's probably something we can do, but if you have some kind of generator in progress, that might make a better starting point.