chipsalliance / rocket-chip

Rocket Chip Generator
Other
3.26k stars 1.13k forks source link

Architecture: Scatter Gather implementation #1607

Closed tampler closed 6 years ago

tampler commented 6 years ago

Hi guys

For a 2-core Rocket-based Linux system, I need to implement a non-cached non-paged Scatter-Gather DMA as a part of the ROCC module. My DMA should have a high-throughput low-latency access to a multibank DRAM and a word-aligned access. The max burst size is 8kB in a SG mode, which may be mapped to 1024 beats and split across multiple AXI4 transactions due to the limit of 256 beats for AXI4 protocol.

The ROCC computed result should be shared among multiple Rocket cores. The ROCC output buffer is also max 8kB.

What is the best architectural way to implement such system ? There are gonna be several long latency operations ( 1000s of cycles) and they will suspend an issuing core as per ROCC implementation.

tampler commented 6 years ago

After peeking and poking over Rocket documentation and Chisel code, I ended up using a TL_UH protocol, which is already implemented in the LazyRocc module. For those, who may be interested in more details on that, pls refer to Issue #1611, where I'll elaborate more on my experience with TileLink