bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
225 stars 58 forks source link

The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric #704

Closed amithmath closed 1 year ago

amithmath commented 1 year ago

Hi,

I was reading the paper: The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric, I guess 511 core RISC-V are connected to RISC-V 64 bit Rocket/BlackParrot cores. But the bsg_replicant/bsg_f1 (old repo) repo 511-Cores RISC-V are communicating host CPU through PCIe slot. The same has been mentioned Hammer Blade Technical reference guide (https://docs.google.com/document/d/1b2g2nnMYidMkcn6iHJ9NGjpQYfZeWEmMdLeO_3nLtgo/edit) Page 6, foot note 1 says: The current version of HammerBlade does not yet include BlackParrot cores, but will soon. Instead, it is controlled by a Linux host over PCIe that connects to an I/O node on the manycore, much like a discrete GPU.

I am looking for repo where 511 RISCV Cores are conntected to BlackParrot. Can you please point me to that repo?

I did check these repos: https://github.com/black-parrot and https://github.com/bespoke-silicon-group, I am unable to locate.

Many Thanks, -Amit

dpetrisko commented 1 year ago

Hi,

We have an open-source example of BP+manycore here. This is an FPGA prototype of an ASIC we have taped out, so the default configuration is much smaller so as to fit. But if you’re simulating or taping out, you can easily scale up the cores as needed. 

https://github.com/black-parrot-hdk/zynq-parrot/tree/master/cosim/hammerblade-example

amithmath commented 1 year ago

I got your point. If I want to tape-out bsg_f1 TSMC 16 nm FinFET, one has to communicate bsg_f1 through PCIe and DRAM. How about these IPs for ASIC tape-outs? Do you have any of these IPs? If not, do I have to use Cadence/Synopsys IPs?

dpetrisko commented 1 year ago

https://github.com/bespoke-silicon-group/basejump_stl/tree/master/bsg_link

We use bsg_link as an off-chip DDR tunnel for I/O.  We have taped out in 12 and 28. We typically use off-the-shelf LVCMOS I/Os and live with the modest bandwidth. For PCIE specifically you need analog components so probably will need to purchase. 

https://github.com/bespoke-silicon-group/basejump_stl/tree/master/bsg_dmc

We also have an LPDDR1 controller, this has been taped out in 28. To adapt to a new DDR module is non-trivial, but fairly straightforward. Otherwise, the two main options are 1) using bsg_link and bsg_channel_tunnel to tunnel your DRAM traffic along with your IO traffic or 2) yes, purchasing an off-the-shelf DRAM IP. Option 1 is solid and cheap in terms of IP as well as pin cost. 

amithmath commented 1 year ago

Assuming if I go for option 1) only ddr+pcie+bsg_f1 won't suffice because, if one turn on the card, I think it initially pass through boot sequence before ready to use as stand alone accelerator card. In case of FPGA cards, I guess this initially taken care by hard processor on the board, it could be ARM or Microblaze. In case of stand alone ASIC accelerator card, I have no idea about boot sequence code/logic and processor to boot. Any suggestions?

dpetrisko commented 1 year ago

Would have to know more about your intended use-case to make a recommendation. If you wanted a standalone card, BlackParrot is able to boot from a ROM and then could bootstrap the system from an on-board flash.

We typically make test chips with many redundancies, so I/O are handled from the PS of an attached FPGA board such as DoubleTrouble. You can find these (open-source) designs here: http://bjump.org/index.html

amithmath commented 1 year ago

I want to make it as stand alone GP GPU accelerator card for general purpose computations like NVIDIA graphics cards. Any recommendations?

dpetrisko commented 1 year ago

Gotcha, that is a complex task well outside the scope of this repo. Happy to have a chat offline, if you want to discuss further at petrisko@cs.washington.edu

amithmath commented 1 year ago

Okay, I will email you.