byuccl / rs-cad

CAD Tools (Packer, Placer, & Router) for RapidSmith2.
3 stars 1 forks source link

Benchmark status #23

Open DallonTG opened 5 years ago

DallonTG commented 5 years ago

viterbi decoder -- passing bgm -- passing -- Fill in the remainder --

As you know, the bgm benchmark can't finish packing. I spent some time looking at it and will describe what I noticed here in hopes of it being helpful.

The error:

Exception in thread "main" edu.byu.ece.rapidSmith.cad.pack.rsvpack.CadException: No valid pack unit for clusterChain fract_out_q_reg[3]_i_1
    at edu.byu.ece.rapidSmith.cad.pack.rsvpack._RSVPack.packNetlist(RSVPack.kt:195)
    at edu.byu.ece.rapidSmith.cad.pack.rsvpack._RSVPack.pack(RSVPack.kt:83)
    at edu.byu.ece.rapidSmith.cad.pack.rsvpack.RSVPack.pack(RSVPack.kt:35)
    at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow.run(Artix7.kt:55)
    at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow$Companion.main(Artix7.kt:86)
    at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow.main(Artix7.kt)

Here is a snapshot of how Vivado packs this cell.: image

clusterChain fract_out_q_reg[3]_i_1 is the CARRY4 cell and is the seed cell. The Force Routing Prepacker adds cells to the D6, C6, B6, and A6 LUTs. In RSVPack, the A6 LUT cell is a route-through that has been converted into a 1LUT cell. The Table-Based Routability Checker finds two conditional cells: fract_out_q[3]_i_2 and a0_add/fract_out_q_reg[0]. (In Vivado, the fract_out_q[3]_i_2 cell is placed on the A5 LUT BEL and the a0_add/fract_out_q_reg[0] cell is placed on the AFF BEL).

The packer ends up failing when it gets back to the Table-Based Routability checker because it thinks both the A5LUT cell and the A6LUT cells both need to be driven by a different net on the A1 bel pin.

I tried to figure out what was going wrong here for a while and also tried inserting some manual code to force the pass-through cell on the A6LUT Bel to use a different pin, but the packer then ran into another situation almost identical to this one, but with two non-routethrough cells on the fracturable ALUT.

trharoldsen commented 5 years ago

Hmm, sounds like I'm creating the ROUTETHROUGH LUTs incorrectly. I'm not sure I even though to check if the other BEL was being used and what pins it might be using.

On 11/16/18 5:25 PM, Dallon wrote:

As you know, the bgm benchmark can't finish packing. I spent some time looking at it and will describe what I noticed here in hopes of it being helpful.

The error:

|Exception in thread "main" edu.byu.ece.rapidSmith.cad.pack.rsvpack.CadException: No valid pack unit for clusterChain fract_out_q_reg[3]_i_1 at edu.byu.ece.rapidSmith.cad.pack.rsvpack._RSVPack.packNetlist(RSVPack.kt:195) at edu.byu.ece.rapidSmith.cad.pack.rsvpack._RSVPack.pack(RSVPack.kt:83) at edu.byu.ece.rapidSmith.cad.pack.rsvpack.RSVPack.pack(RSVPack.kt:35) at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow.run(Artix7.kt:55) at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow$Companion.main(Artix7.kt:86) at edu.byu.ece.rapidSmith.cad.families.artix7.SiteCadFlow.main(Artix7.kt) |

Here is a snapshot of how Vivado packs this cell.: image https://user-images.githubusercontent.com/16548246/48649694-b2517980-e9b0-11e8-9564-587580fe143b.png

clusterChain fract_out_q_reg[3]_i_1 is the CARRY4 cell and is the seed cell. The Force Routing Prepacker adds cells to the D6, C6, B6, and A6 LUTs. In RSVPack, the A6 LUT cell is a route-through that has been converted into a 1LUT cell. The Table-Based Routability Checker finds two conditional cells: fract_out_q[3]_i_2 and a0_add/fract_out_q_reg[0]. (In Vivado, the fract_out_q[3]_i_2 cell is placed on the A5 LUT BEL and the a0_add/fract_out_q_reg[0] cell is placed on the AFF BEL).

The packer ends up failing when it gets back to the Table-Based Routability checker because it thinks both the A5LUT cell and the A6LUT cells both need to be driven by a different net on the A1 bel pin.

I tried to figure out what was going wrong here for a while and also tried inserting some manual code to force the pass-through cell on the A6LUT Bel to use a different pin, but the packer then ran into another situation almost identical to this one, but with two non-routethrough cells on the fracturable ALUT bel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/byuccl/rs-cad/issues/23, or mute the thread https://github.com/notifications/unsubscribe-auth/AMdCwp_Y_Ng3pq5CyUNoa8c3dtZGrNNsks5uvztTgaJpZM4YnEHU.

DallonTG commented 5 years ago

If it helps, this is the place in the code where the routing was ultimately deemed infeasible:

if (rowStatus.feasibility != Routability.INFEASIBLE) {
    // check if the source is already being used
    val net = cellPin.net!!
    val sourceNet = rowStatus.claimedSources[result.claimedSource]
    if (sourceNet != null && sourceNet != net) {
        // the source is already being used by another net and this row
        // is therefore invalid
        rowStatus.feasibility = Routability.INFEASIBLE
        return // exit early
    } else ...

This is in TableBasedRoutabilityChecker.kt, starting at line 558 in master.

trharoldsen commented 5 years ago

Here is what is occurring.

The error is occurring from a combination of the LUT-ROUTETHROUGH (Artix7.kt) inserter and the PinMapper (Artix7.kt). Prior to packing, to remove the trouble of LUT routethroughs, the algorithm creates a LUT for each CARRY4 (C4) input that is not driven by a LUT. This guarantees that a LUT exists just before the C4 inputs and the routing feasibility check (rfc) does not need to support route throughs.

This insertion is performed by looking at each of the inputs to a C4 or MUXF7 cell and, if the cell preceding it is not a LUT, creating a LUT1 cell programmed with the identity function with the net driving the C4 pin as the input and inserting it between the net and the C4 pin.

When packing, just prior to performing the rfc, the cell pin -> bel pin mappings have to be ascertained to determine what paths through the cluster are needed. Typically, this mapping is trivial, but in a few cases where multiple mappings exist for some cells, some predictions are required. The packer currently performs these predictions for the C4 CIN input, but the LUT inputs are performed via a dumb I0-A1, I1-A2, etc type mapping.

In the case we're seeing, a LUT route through has been inserted for the 6LUT but not for the 5LUT. I'm not sure if this is because the design originally had

  1. a LUT on the 5LUT bel but not on the 6LUT bel driving the C4 or
  2. both the 5LUT and 6LUT bels were route throughs, but the DI# pin on the C4 was driven by a LUT cell while the S# pin on the CR4 was driven by a FF or some other non-LUT cell.

I'm guessing the 2nd case is what's happening, but I haven't confirmed that.

Either way, while packing the cluster of interest, the C4 requires adding both the LUT cell in front of the DI# pin, cell X, (which will end up on the 5LUT BEL) and the inserted route through LUT in front of the S# pin, cell Y, (which will end up on the 6LUT BEL). Upon reaching the rfc, the pins will be assigned. Using the naive approach mentioned above, the pin mappings will be something like cell X.I0 -> A1, X.I1 ->A2, etc and cell Y.IO -> A1. These nets requesting A1 are obviously going to be different resulting in a resource conflict on the pin. This is then causing the rfc to fail (as it should)

trharoldsen commented 5 years ago

Thoughts on fixes.

  1. It really feels like the packer needs to support some form of pin permutability. With LUT equations getting reduced to LUT1, LUT2 etc sized LUTs, we're losing a lot of chances to combine similar LUTs onto a single LUT. This issue also popped up when I tried to support the LUTNM stuff.
  2. I'm worried that we could see a case where the 5LUT in the discussed case is using all 5 pins, in which case the LUT5 and LUT6 route through cells could never be packed together. As a case study, do we ever see instances where one portion of the fracturable LUT is used normally and the other is used as a routethrough? I not, we can just assume that if placing a route through on one C4 input, we need a routethrough on the other C4 instance.

Possible paths for fixes -- not all of these (or maybe even any of these) are complete solutions by themselves:

  1. Support LUT routethroughs in our pack unit representation. This would remove the need for inserting LUT routethrough cells. It may be difficult though to do (I'm not sure how hard nor sure if it would introduce any new issues). I'm worried that it might also multiple the length of our routing feasibility table by 2^6 (or worse) as the LUT would effectively become a 6->1 mux.

  2. Ensuring all previously existing LUT routethroughs are inserted. Discussed above. I'm not sure how to ensure this is accomplished. We don't want to insert new LUTs everywhere when they aren't needed, but we likely need a better approach than currently exists. If we make the assumption that LUT configured as route throughs can only be route throughs for both the O5 and O6 pins, that would simplify things. We could also perform an analysis of the number of nets between the LUT5 and LUT6 cells (if both exist) to ensure they do not exceed 5 (or 6?) pins.

  3. Support permutable pin mappings in the rfc Not positive how to do this. If can't be a general case that all inputs to the LUT BELs are permutable since the lutram input cells are fixed. Would we perform multiple routing checks with different permutations?

  4. Support one-of type sink pins in the rfc (meaning net A can go to any one of pins A1-A6). Interesting, but I'm not sure exactly how it would work. I'd need to think about it some more.

  5. Better pin mapping algorithm Possibly the easiest approach. I've mapped out a few places that would need to be changed (it's more than just a few lines unfortunately). First, the pin mapping piece of code (found in Artix7.kt) would need to be modified to check both the LUT5 and LUT6 cells and assign pin mappings in a deterministic, repeatable manner -- doable, mostly. However, the way the code is structured, I have no means of knowing whether the LUT5 or the LUT6 was packed in first, or if they were packed simultaneously. This makes it challenging to determine what pins were mapped where in a previous invocation and could mean that the pin mappings change if the other half of the fracturable LUT gets used. This is problematic for the rfc algorithm as it expects pin mappings to remain constant from one invocation to the next (the rfc is designed to be reentrant). Additionally, any history in the pin mapper would need to be capable of being rolled back -- a capability not supported right now. If we take this route, the rfc would likely need to be updated to store the previous pin assignments, test if they have changed between invocations, and reset the table if they have changed. This might lead to a slow down though I'm not sure how much. Alternatively, we could pass in the pin mapping history in the rfc to the pin mapper to guide it in maintaining consistency. This somewhat couples the rfc to pin mapper algorithm which I've tried to avoid, but if it's necessary, it's necessary.

Let me know your thoughts if you have any on this.

DallonTG commented 5 years ago

Your description of the problem makes sense. It is what I thought was happening until I tried forcing the route-through to use a different pin and I ran into another problem.

In the case we're seeing, a LUT route through has been inserted for the 6LUT but not for the 5LUT. I'm not sure if this is because the design originally had

  1. a LUT on the 5LUT bel but not on the 6LUT bel driving the C4 or
    1. both the 5LUT and 6LUT bels were route throughs, but the DI# pin on the C4 was driven by a LUT cell while the S# pin on the CR4 was driven by a FF or some other non-LUT cell.

I'm guessing the 2nd case is what's happening, but I haven't confirmed that.

In the case we're seeing, the LUT on the 5bel was not a route-through (it was an inverter). The S# pin on the CARRY4 was driven by a FF and so a route-through was inserted.

As a case study, do we ever see instances where one portion of the fracturable LUT is used normally and the other is used as a routethrough? I not, we can just assume that if placing a route through on one C4 input, we need a routethrough on the other C4 instance.

This is one of the cases I have observed most often - where one portion is used normally and the other is a LUT. So I don't think we can make that assumption.

I'm not sure which of your proposed solutions sounds best to me yet. But I also got the impression that the the packer needs some sort of pin permutability. Since this also caused you problems with LUTNMs, I think an ideal solution would address this and not just be limited to the route-through issue. Could you describe a bit more of the challenges you faced with LUTNMs? Was the problem with both SOFT_HLUTNM and HLUTNM properties, or just SOFT_HLUTNM properties?

trharoldsen commented 5 years ago

I've updated this to reflect benchmark status now.

nelsobe commented 5 years ago

Question: where do we sit w.r.t. pin permutability? The benchmark circuits go through the packer now but are they correct or do they need pin permutability code added?

trharoldsen commented 5 years ago

Pin permutability should already be merged into the main branch.

On 12/10/18 5:05 AM, nelsobe wrote:

Question: where do we sit w.r.t. pin permutability? The benchmark circuits go through the packer now but are they correct or do they need pin permutability code added?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/byuccl/rs-cad/issues/23#issuecomment-445759227, or mute the thread https://github.com/notifications/unsubscribe-auth/AMdCwt3yOhqumGMu9-L6QtRjaKXk6YHjks5u3jHqgaJpZM4YnEHU.