SpiNNakerManchester / PACMAN

Partition and Configuration Manager for SpiNNaker
Apache License 2.0
9 stars 7 forks source link

Allow external device keys to come from multiple FPGA sources #397

Closed rowleya closed 8 months ago

rowleya commented 3 years ago

In the current code, and external device which is connected to an FPGA can be specified to be fed from a single FPGA id and FPGA link id. We are currently looking at an interface that distributes the traffic between the different FPGA links in order to ensure as high a bandwidth transfer as is possible. It would therefore be helpful to have the ability to specify a single external device using ApplicationFPGAVertex but then be able to specify multiple FPGA id/FPGA link id pairs, or even just say "all connections on this board" or "all links from this FPGA".

Some potential thoughts / issues:

Christian-B commented 3 years ago

One idea would be to temporarily use different keys during routing. For example if you have 4 machine vertexes increase the fixed key mask by an extra 2 bites. In the 4 fpg set neither, the first, the second, both. In the receiver set the two bits back to zero.

Things to watch out for

  1. There is not other fixed that the larger key will clash with
  2. The receives does not set to zero bits in keys from other sources.
rowleya commented 3 years ago

After some investigation, it appears that it is not possible for the device to send the same keys to multiple FPGAs without a lot of disruption. In particular, this would restrict the placement of receiving cores to ensure that they are not on any of the FPGA-connected cores, as otherwise loops will happen.

The current design is that the keys and sending device is set up so that pixels of a source retina that are close to each other are sent to different FPGAs i.e. the LSB of the dimension-fields in the keys are used to determine which FPGA the pixels are sent to For example, take a key with fields: | key = 12 bits | polarity = 1 bit | y = 9 bits | x = 10 bits |

If we have 8 FPGA links to send over, the 1 LSB of y and 2 LSBs of x can be used to determine which FPGA link to send over, giving a mask of 0xFFF00403.

The next challenge with this layout is to send appropriate squares to the appropriate receivers i.e. in the case being considered, the receivers are convolution populations. This means that there are multiple sources of the keys to be received by each target core. Specifically, it is not desirable to receive all the keys at all of the target cores, as this means that the targets have to deal with more keys than they can handle.

Possible ideas:

rowleya commented 3 years ago

Thinking about this again, the aim is that a) the FPGA packets arrive such as to maximize the bandwidth of the reception and b) the cores that receive the packets minimize the number of unwanted packets received. With the above encoding, adjacent pixels are sent to different FPGA links, which would appear to maximize bandwidth. However as it stands, each of these FPGAs will be represented by a single virtual machine vertex. This means that any core that wants to receive any of the pixels being sent by that FPGA link will have to receive all of them. The above then suggests a mechanism that would reduce the load on the cores, but would end up still putting the full load on any router between the FPGA and any target chip, since there will be a single machine edge involved in that link.

This is shown in the picture below. In this picture, all pixels that are received on FPGA link 0 are shown in red, and similarly those from FPGA link 3 are shown in blue, link 4 in green and link 7 in yellow (others are not shown to reduce the cluttering). So to receive the pixels that make up the black square, all the pixels from all the FPGAs must be received at least at the chip.

Retina FPGA Mapping

Another way to split things is to therefore have multiple machine vertex sources, each of which then groups more local pixels together again. Again, taking the 640x480 image, using the mask of 0xFFF00403 we have now split this into 2x4 rectangles, where one pixel from each rectangle is sent to a different FPGA. We can now choose to group these rectangles into chunks of e.g. 16x16 pixels (i.e. a group of 8x4 of the 2x4 rectangles). Each of these can now be assigned to appear to be sent by separate machine vertices of the virtual device, which means that this device will send a subset of the pixels received by an FPGA. Now to receive the black square in the diagram again, the receiver would only need to receive the pixels in that square which can be done by filtering edges. Receiving the purple square would still require extra pixels to be received, but still less than the whole image.

rowleya commented 8 months ago

Fixed; this is now implemented in current code