bittide / bittide-hardware

17 stars 1 forks source link

Create concept for demo rack #228

Open lmbollen opened 1 year ago

lmbollen commented 1 year ago

The mining rig will eventually contain up to 9 FPGAs that all contain one (or more?) bittide domains. Each domain is controlled by a clock control algorithm that will run on a RISC-V soft-core (VexRiscV). For each domain, every incoming link is connected to an elasticBuffer. These elastic buffers produce datacount s that represent the number of elements in the buffer.

The clock control algorithm uses the datacounts of all incoming links to control its own frequency. In the end we'd like to have a hardware experimentation platform, which is remotely accessible, presumably through github actions.

This experimentation platform can be used to experiment with different configurations for the clock control algorithm for different topologies.

Currently we expect to require the following features:

Since this is mostly conceptual work, features can be added, dropped or moved elsewhere. This issue can be closed when we have a conceptual overview of all steps that have to be performed when performing an experiment via GA. Alongside a architectual overview of the required components.

kleinreact commented 1 year ago

Some first thoughts right below. I primarily focused on the FPGA setup for the moment, because I see most of the open decisions to be there at the moment. The mentioned proposals should be seen as a starting point for future discussions and can be refactored totally from scratch if required.

General Observations

One of the major challenges is the configuration management of the runtime data on each FPGA while simultaneously running the CCA. There are two proposals for realizating this:

Proposal 1: Use an additional managing core

This approach builds on an easier concept, but is more resource-heavy on the FPGA. The main idea is to spend an extra RISC-V soft core just for handling all the management tasks for loading and running the CCA. In particular, this core will be responsible for

The second core just runs the CCA. It only has access to the elasticBuffer data and the network topology, and produces the calculated clock modification strategies.

Pros:

Cons:

Proposal 2: Introduce the CCA as a single (side-effect free) C/Rust function

This approach only requires one CPU core, but restricts the execution context of the CCA. As for Proposal 1, all management tasks are still implemented in software, but are executed on the same RISC-V soft core as the CCA. Running the CCA then can be seen as calling a special purpose main function in C or Rust, where the topology and buffers are passed as arguments and which returns the clock adaption strategy instead of an int. Note that we especially choose main for this analogy here, since main is the function that is compiled to an executable, which is exchangeable at runtime. The function is then called repeatedly by the management context, and can (if necessary) also be scheduled at fixed times with constant frequency.

Pros:

Cons:

kleinreact commented 1 year ago

Some first proposal for an architecture implementing Proposal 1. The memory layouts are just given for illustration to have a first working example.

Architecture

CB Implementations

CB CCA maps:

CB CMU maps:

Abbreviations

Short Long
CB Crossbar
CCA Clock Control Algorithm
CMU Central Management Unit
CPU Central Processing Unit
EB Elastic Buffer
MM Memory Map
RAM Random Access Memory
ro read only
ROM Read Only Memory
rw read/write
WB Wishbone Interface
martijnbastiaan commented 1 year ago

I've got a couple of questions / remarks.

  1. The CMU will use the FPGA's onboard oscillator/PLL. The CCA will be controlled by the clock multiplier boards. Will the CMU be responsible for talking to the clock multiplier boards? If not, will the CMU's reset be controlled by circuitry doing the talking? (For context: we already have an FPGA design that can do the talking.)
  2. With regards to reset sequencing: I'm assuming the CMU will control the CCA's reset. Correct?
  3. We need to get debug information from the CCA to the CMU. Would it be an idea to reserve space in the CCA's dmem: an address x storing how much debug info is is stored. Debug info stored at x + n where n ~ *x (*x ~ deref x). That way CMU could monitor x and reset it to 0 when it has transferred everything to the host. The CCA would then be two functions: the programing running on the CCA CPU and one interpreting the debug output on the host.
kleinreact commented 1 year ago

The CMU will use the FPGA's onboard oscillator/PLL. The CCA will be controlled by the clock multiplier boards. Will the CMU be responsible for talking to the clock multiplier boards?

If the CCA is active, then only the CCA should be able to change the state of the multiplier boards. If the CCA is not active, e.g., currently updated or paused, then I still don't see any requirements for the CMU changing the boards state other than resetting them completely for starting a new experiment.

If not, will the CMU's reset be controlled by circuitry doing the talking? (For context: we already have an FPGA design that can do the talking.)

What talking are we actually talking about?

With regards to reset sequencing: I'm assuming the CMU will control the CCA's reset. Correct?

That's correct. This is what the CCA control bits are intended for.

We need to get debug information from the CCA to the CMU. Would it be an idea to reserve space in the CCA's dmem: an address x storing how much debug info is is stored. Debug info stored at x + n where n ~ x (x ~ deref x). That way CMU could monitor x and reset it to 0 when it has transferred everything to the host. The CCA would then be two functions: the programing running on the CCA CPU and one interpreting the debug output on the host.

I don't think we should make things too complicated here. I would prefer not putting any restriction on the executed CCA code at all. Especially, the CCA code should not be restricted in how the CMU debugging works, otherwise we might limit the CCA's capabilities although we don't have to.

Technically, every action of the CCA can be observed via the CCA's iMEM and dMem operations. The only bit that is currently missing is the instruction pointer of the CCA, but that one can be added as well. Clearly, observing all state of the CCA is hard (except we run the CCA much slower than the CMU), but observing only several dedicated dMem addresses should be fine and also sufficient for standard debugging tasks. Usually, you only need monitoring capabilities like

if <iMem instruction @addr X> gets executed, then get the content of <dMem @addr A_1>, .. <dMem @addr A_n>

Or do you see any other requirements here?