Closed wenwang3122 closed 4 years ago
I am not sure if it is suitable to post such question here
No worries, that's fine ^^
Do you have any ideas about the cycles overhead?
Hmm would need a wave to figure out the timings. If you want i can point out some key signals ? It might be en part because the instruction bus is accessing the bus at the same time, which create some conflicts. The data bus and instruction bus share the same bus to access peripherals and ram, see "val mainBusArbiter = new MuraxMasterArbiter(pipelinedMemoryBusConfig)" , your diagram isn't accurate on that point. (dBus+iBus) -> mainBusArbiter -> (ram + peripheral)
Or do you have any recommendations about how to do the software-hardware co-design based on VexRiscv in a different setup to reduce the IO overhead?
Yes, basicaly, could add a custom VexRiscv instruction to directly feed the accelerator with CPU data, else, less optimal, but still better than nothing, is to avoid APB and directly define a SimpleBus peripheral
Would switching to Briey SoC (with Dcache, Icache, MMU, AXI bus) help reduce the IO overhead
Yes and no. Yes because it would avoid the shared memory bus issue, no because the way the d$ access MMIO is less dirrect and has additional penalities compared to the $less design..
I think the very best would realy be to use a custom instruction to drive some stream of data, if that's an option for you :)
Thank you for the feedback, that helps a lot in understanding the experimental results and I have a pretty good understanding about how to optimize the design now.
Hi!
The following question is not about a bug in the code, it is more of an IO overhead question regarding my software-hardware co-design experiments based on Murax SoC. I am not sure if it is suitable to post such question here, please feel free to let me know if it is not okay to do so.
Here is the background: I am using Murax SoC as a platform to do software-hardware co-designs, and the hardware accelerators are added as APB peripherals. I attached a sample diagram below (murax-co-design-sample-diagram.png).
In this setup, the software-hardware interface overhead is critical for the overall performance. During my experiments, I found out that:
Suppose data_in is an MMIO register, the code is as follows: data_in[0] = variable_reg;
data_in[0] = variable_array[i]; result_array[i] = data_out[0];
Then the cycles for APB write increases from 3 to 6/8/9 cycles (different scenarios), and APB read increases from 4 to 8/9/10 cycles (different scenarios). The increases (APB write/read alone vs APB write/read + memory access) exceeds the memory access alone cycles (based on my benchmarks).
Here are my questions:
Thank you very much! Wen