CSL-KU / firesim-nvdla

FireSim-NVDLA: NVIDIA Deep Learning Accelerator (NVDLA) Integrated with RISC-V Rocket Chip SoC Running on the Amazon FPGA Cloud
Other
161 stars 31 forks source link

How to change the LLC size #14

Open ku-researcher opened 4 years ago

ku-researcher commented 4 years ago

Hi @farzadfch. I read your paper 'Integrating NVIDIA Deep Learning Accelerator(NVDLA) with RISC-V SoC on FireSim'. In your paper, you change the LLC size. Now, I have two questions. (i)Could you tell me the way to change the LLC size and measure NVDLA's running time? (ii)Is it possible to measure the number of DRAM access when changing the LLC size? In the present circumstances, I succeed in running YOLOv3 on NVDLA. Please give me the advice. I'm sorry to trouble you again.

farzadfch commented 4 years ago

(i) The LLC size can be configured using the runtime config file (.conf). When you simulate a design, firesim creates runtime.conf with default values in firesim-nvdla/sim/output/f1/FireSimNoNIC-FireSimRocketChipQuadCoreConfig_WithNVDLALarge-FireSimDDR3FRFCFSLLC4MBConfig75MHz.

To change the default values, copy this file to firesim-nvdla/sim/custom-runtime-configs and rename it. For LLC, you want to change mm_llc_wayBits, mm_llc_setBits, and mm_llc_blockBits. Then, in firesim-nvdla/deploy/config_hwdb.ini, change customruntimeconfig at line 45 to the name of your custom config file.

(ii) Yes, it is. FireSim has performance counters for LLC and DRAM. These counters can be dumped at a configurable interval, by setting profileinterval in config_runtime.ini. This will dump the counters in memory_stats.csv on the F1 instance. To tell firesim to copy this file to the manager, add "memory_stats.csv" to "common_simulation_outputs" in firesim-nvdla/deploy/workloads/darknet-nvdla.json.

ku-researcher commented 4 years ago

Thank you for you reply. About (ii), according to your advice, I changed profileinterval in config_runtime.ini. As a test, I set profileinterval = 1000000000,then I got "memory_stats.csv" as follows:

brespError,llc_misses,llc_refills,llc_writebacks,rankPower_0_allPreCycles,rankPower_0_numACT,rankPower_0_numCASR,rankPower_0_numCASW,rankPower_1_allPreCycles,rankPower_1_numACT,rankPower_1_numCASR,rankPower_1_numCASW,rankPower_2_allPreCycles,rankPower_2_numACT,rankPower_2_numCASR,rankPower_2_numCASW,rankPower_3_allPreCycles,rankPower_3_numACT,rankPower_3_numCASR,rankPower_3_numCASW,readOutstandingHistogram_0,readOutstandingHistogram_1,readOutstandingHistogram_2,readOutstandingHistogram_3,readOutstandingHistogram_4,rrespError,totalReadBeats,totalReads,totalWriteBeats,totalWrites,writeOutstandingHistogram_0,writeOutstandingHistogram_1,writeOutstandingHistogram_2,writeOutstandingHistogram_3,writeOutstandingHistogram_4 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,8310264,8192286,8178074,64707752,1471687,2045112,2043220,64527002,1475566,2049546,2043880,64471229,1479187,2051713,2046221,64232340,1472986,2045915,2044752,709121229,290873887,4873,11,0,0,73537670,9192250,68264376,8533047,913562290,86437710,0,0,0 0,12758235,12470054,12048846,454880530,2261921,3124837,3025115,459853883,2249610,3128913,3018048,461987161,2238552,3104573,3008401,462006758,2224187,3111731,2997282,1450262568,544832601,4866909,37922,0,0,196280563,24535150,125925672,15740709,1840093911,159906089,0,0,0 0,14010394,13666307,12402473,1257331170,2454127,3446631,3122840,1268983546,2439398,3456201,3119605,1275802759,2397752,3374269,3093854,1280189388,2364203,3389206,3066174,2385093767,605904463,8939902,61868,0,0,223473408,27934295,131150872,16393859,2833191588,166808412,0,0,0

Now, I have two questions.

(a)I want to get the number of the DRAM accesses when I run Yolov3 on NVDLA. So,please teach me which value to look at.

(b)Can I change the clock frequency of NVDLA(default: 75MHz)? And what is the operating frequency of LLC and DRAM?

farzadfch commented 4 years ago

For the number of accesses to DRAM, you can use llc_misses.

The target (i.e. simulated) machine clock frequency is artificially set to 3.2 GHz. To change that, please read this answer by Sagar on the FireSim google group. There is a broken link in that answer. You can use this link instead.

ku-researcher commented 4 years ago

Thank you for your reply.

I have a question about your paper "Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim". In your paper, the clock frequency of NVDLA and Processor is 3.2GHz. Figure 4 in your paper shows 7.5 fps (NVDLA and Rocket).Is this fps the performance when NVDLA and Rocket run with not 3.2GHz but 75MHz? And I think the real clock frequency of NVDLA is different from that of DRAM. I think that of NVDLA is 75MHz,but how many frequency of DRAM?

Thank you

farzadfch commented 4 years ago

Sorry for the delayed reply. I know that this can be puzzling but you should not confuse the "target" and the "FPGA host" clocks. Here, 3.2GHz is the target clock frequency. It is the simulated clock frequency and all of the performance measurements are based on this frequency. That means the 7.5 fps is based on NVDLA (and the Rocket processor) clocked at 3.2GHz.

The 75MHz clock is the FPGA host clock frequency. You do not really need to care about this clock but if you want to learn more, you can check out the original FireSim paper and "FASED: FPGA-Accelerated Simulation and Evaluation of DRAM" and also look up some of their references. This may help you understand how FireSim "decouples" target and host clocks and let you simulate a design at 3.2GHz on an FPGA that is clocked at 75MHz.

Currently, FireSim does not support multiple clock domains. That means DRAM is simulated at the same frequency as the rest of the design (i.e. 3.2GHz). AFAIK, multi-clock support is planned for later version and I hope I can integrate it to FireSim-NVDLA.