Open lmbollen opened 1 year ago
Some first thoughts right below. I primarily focused on the FPGA setup for the moment, because I see most of the open decisions to be there at the moment. The mentioned proposals should be seen as a starting point for future discussions and can be refactored totally from scratch if required.
elasticBuffer
s (optional)One of the major challenges is the configuration management of the runtime data on each FPGA while simultaneously running the CCA. There are two proposals for realizating this:
This approach builds on an easier concept, but is more resource-heavy on the FPGA. The main idea is to spend an extra RISC-V soft core just for handling all the management tasks for loading and running the CCA. In particular, this core will be responsible for
elasticBuffer
s and the outcome of the CCA, andThe second core just runs the CCA. It only has access to the elasticBuffer
data and the network topology, and produces the calculated clock modification strategies.
This approach only requires one CPU core, but restricts the execution context of the CCA. As for Proposal 1, all management tasks are still implemented in software, but are executed on the same RISC-V soft core as the CCA. Running the CCA then can be seen as calling a special purpose main
function in C or Rust, where the topology and buffers are passed as arguments and which returns the clock adaption strategy instead of an int
. Note that we especially choose main
for this analogy here, since main
is the function that is compiled to an executable, which is exchangeable at runtime. The function is then called repeatedly by the management context, and can (if necessary) also be scheduled at fixed times with constant frequency.
Some first proposal for an architecture implementing Proposal 1
. The memory layouts are just given for illustration to have a first working example.
EB
s, CPU CCA
, CB CCA
, and CC
are considered to be part of the CCA Module
.CPU CMU
, CB CMU
, iMem CMU
, and dMem CMU
are considered to be part of the CMU Module
.dMem CCA
and iMem CCA
are shared between both modules.CB CCA
maps:EB
s to the addresses of dMem CCA
from 64
to 64 + (n-1) * s_e
(read only)64 + n * s_e
of dMem CCA
to the clock control interface (write only)CB CMU
maps:4
of dMem CMU
to rst
of the CCA Module
4
of dMem CMU
to ena
of the CCA Module
iMem CCA
to iMem CMU
starting at address <a_i>
stored at address 8
dMem CCA
to dMem CMU
starting at address <a_d>
stored at address 12
rst
is low and ena
is high (see control bits above), then the mapped iMem CMU
and dMem CMU
are readonlyShort | Long |
---|---|
CB | Crossbar |
CCA | Clock Control Algorithm |
CMU | Central Management Unit |
CPU | Central Processing Unit |
EB | Elastic Buffer |
MM | Memory Map |
RAM | Random Access Memory |
ro | read only |
ROM | Read Only Memory |
rw | read/write |
WB | Wishbone Interface |
I've got a couple of questions / remarks.
x
storing how much debug info is is stored. Debug info stored at x + n
where n ~ *x
(*x
~ deref x
). That way CMU could monitor x
and reset it to 0
when it has transferred everything to the host. The CCA would then be two functions: the programing running on the CCA CPU and one interpreting the debug output on the host.The CMU will use the FPGA's onboard oscillator/PLL. The CCA will be controlled by the clock multiplier boards. Will the CMU be responsible for talking to the clock multiplier boards?
If the CCA is active, then only the CCA should be able to change the state of the multiplier boards. If the CCA is not active, e.g., currently updated or paused, then I still don't see any requirements for the CMU changing the boards state other than resetting them completely for starting a new experiment.
If not, will the CMU's reset be controlled by circuitry doing the talking? (For context: we already have an FPGA design that can do the talking.)
What talking
are we actually talking about?
With regards to reset sequencing: I'm assuming the CMU will control the CCA's reset. Correct?
That's correct. This is what the CCA control bits are intended for.
We need to get debug information from the CCA to the CMU. Would it be an idea to reserve space in the CCA's dmem: an address x storing how much debug info is is stored. Debug info stored at x + n where n ~ x (x ~ deref x). That way CMU could monitor x and reset it to 0 when it has transferred everything to the host. The CCA would then be two functions: the programing running on the CCA CPU and one interpreting the debug output on the host.
I don't think we should make things too complicated here. I would prefer not putting any restriction on the executed CCA code at all. Especially, the CCA code should not be restricted in how the CMU debugging works, otherwise we might limit the CCA's capabilities although we don't have to.
Technically, every action of the CCA can be observed via the CCA's iMEM and dMem operations. The only bit that is currently missing is the instruction pointer of the CCA, but that one can be added as well. Clearly, observing all state of the CCA is hard (except we run the CCA much slower than the CMU), but observing only several dedicated dMem addresses should be fine and also sufficient for standard debugging tasks. Usually, you only need monitoring capabilities like
if <iMem instruction @addr X> gets executed, then get the content of <dMem @addr A_1>, .. <dMem @addr A_n>
Or do you see any other requirements here?
The mining rig will eventually contain up to 9 FPGAs that all contain one (or more?) bittide domains. Each domain is controlled by a clock control algorithm that will run on a RISC-V soft-core (VexRiscV). For each domain, every incoming link is connected to an
elasticBuffer
. These elastic buffers producedatacount
s that represent the number of elements in the buffer.The clock control algorithm uses the
datacount
s of all incoming links to control its own frequency. In the end we'd like to have a hardware experimentation platform, which is remotely accessible, presumably through github actions.This experimentation platform can be used to experiment with different configurations for the clock control algorithm for different topologies.
Currently we expect to require the following features:
Since this is mostly conceptual work, features can be added, dropped or moved elsewhere. This issue can be closed when we have a conceptual overview of all steps that have to be performed when performing an experiment via GA. Alongside a architectual overview of the required components.