PolyArch / gem-forge-framework

BSD 2-Clause "Simplified" License
21 stars 9 forks source link

What are gem5 simulation options for streaming engine in SSP? #1

Open haeunlee99 opened 2 years ago

haeunlee99 commented 2 years ago

Hello,

I want to check out streaming engine introduced in the paper "Stream-based Memory Access Specialization for General Purpose Processors". I am running experiments that I think would be the right configuration, however I am new to gem5 so is not pretty sure whether I am doing the right thing. If possible, can someone provide gem5 simulation options so I can reproduce engine introduced in above paper?

Thanks a lot! Haeun

seanzw commented 2 years ago

I think these are the options to configure the stream engine in SSP for out-of-order 8 core.

        "--gem-forge-stream-engine-enable",
        "--gem-forge-stream-engine-total-run-ahead-bytes=2048",
        "--gem-forge-stream-engine-enable-lsq",
        "--gem-forge-stream-engine-enable-coalesce",
        "--gem-forge-stream-engine-throttling=global",

You can try to play with it and take a look at the gem5/configs/example/gem_forge/run.py for all the options.

haeunlee99 commented 2 years ago

Thanks for reply and sorry for asking again :( Are those options then appropriate for single out of order core SSP configuration?

--llvm-store-queue-size=32 \ --llvm-mcpat=0 \ --caches \ --l2cache \ --gem-forge-num-active-cpus=1 \ --gem-forge-cache-load-ports=6 \ --gem-forge-cache-store-ports=4 \ --link-width-bits=256 \ --llc-select-low-bit=6 \ --gem-forge-enable-func-acc-tick \ --prog-interval=10000 \ --tlb-timing-se \ --l1tlb-size=64 \ --l1tlb-assoc=8 \ --l2tlb-size=2048 \ --l2tlb-assoc=16 \ --l2tlb-hit-lat=8 \ --walker-se-lat=16 \ --walker-se-port=2 \ --num-cpus=1 \ --num-l2caches=1 \ --ruby \ --access-backing-store \ --router-latency=2 \ --link-latency=1 \ --mem-channels=2 \ --mem-size=16GB \ --l1i_size=32kB \ --l1i_assoc=8 \ --l1d_size=32kB \ --l1d_lat=8 \ --l1d_mshrs=8 \ --l1d_assoc=8 \ --l1_5d_size=256kB \ --l1_5d_assoc=16 \ --l1_5d_mshrs=16 \ --l2_lat=16 \ --l2_size=1MB \ --l2_assoc=16 \ --l3_lat=20 \ --fast-forward=-1 \ --options=1 \ --cpu-type=DerivO3CPU \ --llvm-issue-width=8 \ --gem-forge-stream-engine-enable \ --gem-forge-stream-engine-total-run-ahead-bytes=2048 \ --gem-forge-stream-engine-enable-lsq \ --gem-forge-stream-engine-enable-coalesce \ --gem-forge-stream-engine-throttling=global \

I have omitted following options, that seems to be related to multi core environment.

--num-dirs=4 \ --mesh-rows=8 \ --network=garnet2.0 \ --garnet-enable-multicast \ --topology=MeshDirCorners_XY \ --routing-YX \

But I am not sure whether these options are necessary.

--link-width-bits=256 \ --gem-forge-enable-func-acc-tick \ --access-backing-store \ --router-latency=2 \ --link-latency=1 \ --gem-forge-stream-engine-throttling=global

And I have several questions regarding it.

  1. Since L3 cache should also be connect to memory, are link-width-bits, router-latency, link-latency necessary?
  2. Does access-backing-store option mean we have DRAM?
  3. What is the difference between dynamic throttling and global throttling? Isn't the one introduced in paper dynamic throttling?
  4. What is the option gem-forge-enable-func-acc-tick?

Thank you :)

seanzw commented 2 years ago

Sorry for the late replay.

I think you are mixing some options here: gem5 has two cache systems. One is classical cache system, and the other one is Ruby. Which one are you trying to use here? All the mesh topology and link width are related to Ruby. All the following explanation assumes you are using Ruby.

  1. In our configuration, L3 cache is not directly connected to the DRAM. Instead, they communicated through the router and the mesh network, so link-width-bits, router-latency and link-latency still matters here.
  2. access-backing-store is an option for Ruby that always get the data from a backing "groundtruth" storage. We need this to simplify the implementation efforts. Regardless of this option, we always have DRAM.
  3. IIRC, global-throttling is the one we evaluated in the paper. You can ignore dynamic-throttling, as it's some design choices that we did not end up using.
  4. gem-forge-enable-func-acc-tick will enable gem5 to dump a breakdown of how many execution cycles are spent within each function in the output folder. It's a profiling flag and should not change the simulation results.

I hope this answers your questions. Let me if you have any more issues.