Open callumforrester opened 10 months ago
@coretl would be interested in your opinions
For solution one: We need to wait until the FPGA updates all outputs and it's hard to know when outputs will arrive. Outputs may not be related to inputs.
More info from a call with @coretl
Solution 2 makes sense, but we should be careful about propagation of consequences. See the example below where the detectors must be kept aware of the motor positions (for example, to generate simulated data).
Detector 1 is first triggered directly by X in t1, which is when it receives an update of X's position. It will then have to cache the position until it receives a trigger in t3, when it is actually supposed to take a frame. There is a danger that X has moved in that time but if it does it will generate extra ticks which will propagate to the detector and allow it to update its cached value.
The case of Y to detector 2 is more complicated, with no link from Y updating to the PandA. The PandA just generates triggers every 4ns using a clock. That means that if Y is not updated, the clock could trigger the detector for many ticks. The detector would have an old value of Y and would produce the same data for each tick. Again, this should not matter because if Y is updated the detector should cache a new value.
This system works, but it does showcase where the event-driven nature of tickit clashes with the sampling-driven nature of a real experiment. If the FPGA triggers detector 2 at a higher frequency than Y is updated, detector 2 will contain striated data because it will use many cached versions of Y.
This is okay if the trigger frequency of each device reflects the real world, but it may not due to CPU constraints. A motor cannot be simulated at the true rate of a pmac, for example. This is the problem that tickit's original zero-time-ticks are meant to solve.
The data striation represents the reason why this design makes the simulation more constrained/lower fidelity. The simulation is still useful even if less general-purpose.
One possible solution is to add a mechanism for passing "curves" rather than scalar values to the detector, so it can evaluate Y on its own until Y updates it with a new curve. A curve could be a function against time or a lookup table. This may be a useful additional feature to add once we have this working.
Why doesn't the FPGA Scheduler just ask to be updated every 8ns
until it is in a stable state, with knowledge of which blocks inside it must compute at the next step stored internally? This way the FPGA Scheduler doesn't need to "own" time and simulation accuracy is preserved
Not sure if there's a way to reliably detect that it is in a stable state, but I defer to @coretl
There is not a way to know when there is a stable state. An input to a PandA ripples through a series of blocks and may or may not produce an output. One thing we discussed is terminating the tick whenever it gets to a PandA which would then schedule a callback for 8ns time until it was complete. This left us wondering if there was any value in the graph traversal of the standard scheduler, and whether it made more sense to make every transition take time, and do everything in the FPGA way
Iirc, we had this conversation way back at the start and decided that adding delays to wires was going to cause far more issues than it solved. Surely if none of the blocks within the PandA change their state during a step then it would be considered stable?
Problem
Simulating FPGAs is difficult. Tickit differs from the FPGA simulation on which was based in the following ways.
Tick Scope
Tickit does not allow cyclic graphs because it considers a tick to be a single, instantaneous propagation of the entire device from the lowest down nodes that require wakeup. The following is a valid graph showing which nodes are visited in which ticks (t0, t1 and t2):
If a cycle were inserted anywhere then t0 would never end because the tick only ends when the output propagates all the way to
D
.The FPGA design allows cycles by having a reduced scope for a tick, only ticking each subsequent node:
Temporal Relevance
In tickit a tick is assumed to be instantaneous, even if it is propagating a signal through a large and complex graph, no simulation time passes. An arbitrary number of ticks can also take place in zero simulated time. In the below example, only tick 3 takes place after 0 nanoseconds:
Time is much more important to FPGA ticks, a delay is enforced between them and "event
x
must happenn
ticks (clock cycles) later than eventy
" is a valid use case.The example below shows a simple traversal with no wakeups:
The graph below shows each propagation stage (tock) of each tick. A tock is defined as a single transfer of output of one node to input of another node, or an initial trigger of a node at the beginning of a tick.
This implies one more visit to
D
than in the tickit model, showing a fundamental difference that cannot be achieved by inserting an artificial delay.Causality
Both of the above examples show how tickit treats causality differently to the FPGA model. It traverses the whole graph instantly to propagate the consequences of events before any time can pass that introduces new events. Thus something at one end of a graph affects something at the other end in 0 sim time.
Proposed Solutions
Hybrid Schedulers
Write a child scheduler that works like the FPGA simulation. Keep the existing master scheduler for wiring all devices outside of the FPGA simulation and reconciling the outputs. See example below:
Several issues with this approach are illustrated here:
NestedScheduler
works, but they also each have their own separate concepts of simulated time.These are not necessarily showstoppers as long as we accept this potential inaccuracy in the simulation.
Different Master Scheduler
Write a new master scheduler to be used for all simulations involving FPGAs, which makes everything time-sensitive. In this case, all ticks are propagated in the FPGA style. All node dependencies must have a delay of at least 1ns. There is still a separate scheduler for controlling FPGA simulations for performance reasons, but it and the master share a concept of simulated time.
In this version the detectors are all triggered at different times and in the order the FPGA requires. The disadvantage is that these simulations are more restrictive and less generic. They can only accept simulated devices that have a concept of time and all causality is based around the FPGA, which makes it more difficult to simulate non-discrete-time entities such as the behaviour of the beam. That does, however, optimise tickit for the hardware triggered scanning use case.